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SUMMARY 


The  efficient  design  of  a  free-play,  24-hour-per-day,  operational  test  (OT)  of  an 
ASW  search  system  remains  a  challenge  to  the  OT  community.  It  will  often  be  the  case 
during  an  ASW  search  OT  that  it  takes  much  more  time  to  gain  a  detection  of  the  target 
submarine  than  the  operational  test  director  (OTD)  had  expected.  At  this  point,  the  OTD, 
typically  concerned  with  the  number  of  detections  and  encounters  that  his  OT  will 
generate,  may  introduce  artificial  means  of  detecting  the  target  in  order  to  speed  up  the 
detection  process  (e.g.,  require  the  target  to  actively  ping  for  a  short  period).  We  have 
argued  that,  when  used,  the  various  artificial  means  of  speeding  up  the  detection  process 
severely  affect  the  realism  of  the  search  OT  and  limit  the  usefulness  of  the  collected  data 
with  respect  to  providing  unbiased  estimates  of  system  effectiveness. 

We  have  suggested  that  fewer  ASW  search  OT  trials,  but  with  maximum  realism 
and  more  data  collection,  can  provide  for  a  far  more  insightful  and  credible  assessment, 
than,  say,  twice  as  many  unrealistic  trials.  Given  that  an  OT  is  designed  to  yield  fewer 
well-analyzed  realistic  trials,  the  OTD  is  still  faced  with  the  problem  of  controlling  the 
OT,  that  is,  controlling  the  average  length  of  time  that  a  test  event  lasts. 

A.  PURPOSE 

This  paper  identifies  test  control  rules  that  an  OTD  can  employ  from  on-board  the 
searching  platform  to  allow  for  an  efficient,  free-play,  open  ocean  (i.e.,  off-range),  24- 
hour-per-day  OT.  In  particular,  this  paper  documents  the  results  of  a  simulation  of  an 
ASW  search  OT  in  which  test  control  rules  are  used  and  explores  the  ramifications  of  the 
various  test  control  rules  on  the  number  of  trials  expected  and  on  the  quality  of  the 
estimates  of  three  search-related  measures  of  effectiveness  (MOEs.) 

The  basic  test  control  premise  described  here  is  to  stop  the  test  event  if  the  time 
without  a  detection/classification  grows  too  long.  Furthermore,  if  this  long  period  passes 
again  without  detection/classification,  then  the  OTD  uses  a  different  search  scenario  in 
which  the  size  of  the  area  being  searched  is  shrunk.  This  process  of  observing  the  times 
to  detection/classification,  and  then  deciding  whether  to  continue  to  the  next  search  as 
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previously  planned,  stop  (“truncate”)  the  current  trial,  or  truncate  the  current  trial  and 
shrink  the  next  box  searched  can  be  used  to  control  ASW  search  OTs.  What  these 
“sequential  test  control  rules”  should  be  -  for  instance,  how  long  until  stopping  the  trial 
(stopping  rule)  and  how  much  should  the  box  be  shrunk  (shrinkage  rule)  -  is  the  subject 
of  much  of  this  paper. 

B.  METHODOLOGY 

A  spreadsheet  simulation  was  designed  to  allow  for  the  exploration  of  several  test 
control  rules.  In  order  to  accomplish  this,  a  straightforward  model  of  the  times  to 
detection/classification  was  developed  based  on  observations  from  a  recent  ASW  search 
OT  and  our  review  of  typical  test  plans  and  procedures.  This  model  uses  a  fitted  gamma 
distribution  to  represent  the  times  to  detection/classification  that  one  might  expect  from  a 
free-play  ASW  search  OT. 1  Our  simulation  proceeds  by  first  drawing  random  times  to 
detect/classify  from  an  appropriate  gamma  distribution.  Next,  the  given  test  control  rules 
are  applied  and  fixed  amounts  of  time  for  localization,  attack,  and  repositioning  are 
added.  The  simulation  keeps  track  of  how  many  detections  and  encounters  occur  for  a 
given  test  duration.  In  addition,  estimates  of  three  search-related  MOEs  —  median  search 
rate  (MdSR),  search  rate  (SR),  and  mean  search  rate  (MSR)  -  are  compiled  for  the 
simulated  OT.2  Finally,  this  process  is  repeated  100  times  for  each  test  situation  and  the 
results  (e.g.,  times  to  detect,  number  of  trials,  MdSR,  SR,  MSR)  are  stored  as  frequency 
tables  (Appendix  B). 

Table  1  presents  the  various  test  situations  that  were  simulated.  Two  system 
performance  levels  were  examined:  “System  A”  -  a  system  that  performs  at  a  level 
similar  to  that  of  a  system  observed  in  a  recent  OT  -  and  “System  B  -  a  system  that 
performs  at  half  the  System  A  level.  That  is,  the  average  time  to  detect/classify  for 
System  A  (while  searching  an  800  NM2  box)  was  7.54  hours,  whereas  the  comparable 
System  B  value  was  15.08  hours. 


1  A  justification  for  the  use  of  a  gamma  distribution  is  provided  in  Chapter  11. 

2  Definitions  for  these  MOEs  can  be  found  in  Chapter  11. 
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Table  1 .  Test  Situations  That  Were  Examined 


Test  Length 

Localization  and 

Stopping  Rule  (Truncation 

Shrinkage  Rule  (%) 

(Days) 

Attack  Time  (Hr) 

Time  -  Hr) 

4  and  8 

2,  4,  and  8 

12,  16,  20,24 

25  and  50 

In  all,  96  different  test  situations  (runs)  were  simulated,  each  one  with  100  trials, 
for  a  total  of  9,600  simulated  OTs.  Figure  1  provides  a  schematic  view  of  the 
simulation’s  operation.  Based  on  the  given  set  of  test  control  rules  and  the  observed 
times  to  detect/classify,  the  feedback  loops  (shown  in  red  in  Figure  1)  serve  to  adjust  the 
size  of  the  area  searched  during  the  OT. 3 
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Output:  Number  of  Encounters,  Truncations, 

Times  to  Detect,  Search-Related  MOE  Estimates 

Figure  1.  Simplified  View  of  Simulation’s  Operation 


3  Additional  details  of  this  simulation’s  operation  can  be  found  in  Chapter  II. 
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C.  RESULTS 


With  respect  to  the  employment  of  test  control  rules,  the  analyses  presented  in  this 
document  support  the  following  conclusions.  First,  employing  stopping  rules  for 
free-play  ASW  search  OT  can  increase  the  number  of  encounters  generated  during  the 
test  and  maintain  elements  of  test  realism.  The  use  of  such  rules  will  be  particularly 
valuable  when  the  system  under  test  performs  significantly  worse  than  pre-test 
expectations.  For  example,  with  an  assumption  of  2  hours  (on  average)  for  localization 
and  attack  and  8  days  of  testing,  and  using  a  16-hour  stopping  rule  and  a  50  percent 
shrinkage  rule,  System  A  had  a  median  value  (based  on  100  trials)  of  17  encounters, 
whereas  System  B,  with  half  the  system  performance  level,  had  a  median  value  of  15. 
(See  Figure  2.)  This  relative  test  efficiency  (i.e.,  the  encounter  sample  size  didn’t 
decrease  as  fast  as  the  system  performance)  results  from  the  16-hour  stopping  rule,  which 
truncated  some  of  the  longest  events  that  could  have  “wasted  OT  time”  and,  for  some 
trials,  led  to  smaller  areas  being  searched. 
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Rules  Generate  Relatively  Similar  Sample 
Sizes  for  Sonars  That  Perform  Very  Differently 


System  B  (Poor  Performer) 
Average  Times  to  Detect, 

|  Twice  as  Long  as  System  A 


Rules  Are  Less  Effective  at 
"Saving  Sample"  for  Shorter  Test 
("Not  Enough  Tinje  To  Work") 


8  Day  OT 


4  Day  OT 


16-Hour  Stopping  Rule,  2  Hours  for  Localization  and  Attack,  50  %  Shrinkage  Rule 


Figure  2.  Example  of  Results 


Longer  test  periods  (on  the  order  of  8  days  or  more)  are  more  likely  to  be 
positively  affected  by  the  test  control  rules  described  in  this  document.  Free-play  test 
durations  of  4  days  or  less  will  be  only  minimally  affected  by  the  rules  described  in  this 
document. 

The  use  of  the  12-hour  stopping  rule  with  a  50  percent  shrinkage  rule  led  to 
unrealistically  short  times  to  detect/classify  for  some  trials.  This  combination  of  rules  led 
to  situations  where  our  test  realism/vigilance  rule-of-thumb  -  “maintain  less  than  half  a 
chance  of  detection/classification  within  6  hours”  -  was  violated.  For  the  system 
performances  examined  (i.e.,  Systems  A  and  B),  stopping  rules  of  16,  20,  or  24  hours, 
used  in  concert  with  50  or  25  percent  shrinkage  rules,  appeared  satisfactory  from  this 
perspective. 

Assuming  an  event-terminated  free-play  OT  is  to  be  conducted, 4  the  use  of  our 
test  control  rules  can  be  expected  to  save  substantial  test  time  given  that  the  system  under 
test  performs  somewhat  worse  than  expected.  We  considered  a  15-encounter  (i.e.,  events 
taken  to  completion)  OT  in  which  System  B,  the  poorer  performer,  was  tested  without 
test  control  rules.  We  also  simulated  the  same  situation  with  the  16-hour  stopping  rule 
and  the  50  percent  shrinkage  rule.  In  both  cases,  100  trials  were  run  (using  the  same 
initial  set  of  random  draws)  and  4  hours  were  assumed  for  the  average  time  to  localize, 
attack,  and  reposition.  Figure  3  shows  the  cumulative  probability  of  completing  such  a 
15-encounter  OT  as  a  function  of  the  number  of  test  days.  Without  the  test  control  rules 
(curve  shown  in  black),  it  takes  between  about  8  and  18  days,  with  a  median  value  of 
about  12  days,  to  complete  this  free-play  OT.  By  using  the  16-hour/50  percent  shrinkage 
rule  (curve  shown  in  red),  this  15-encounter  OT  of  System  B  takes  between  about  7  and 
12  days,  with  a  median  value  of  about  8  days.  That  is,  using  the  median  values  for 
comparison,  these  specific  test  rules  allow  this  OT  of  System  B  to  be  completed  in  33 
percent  less  time.  Figure  3  also  shows  the  cumulative  probability  of  completing  a  15- 
event  (encounters  plus  truncations)  OT  with  the  test  control  rules  (curve  shown  in  blue). 
This  OT  is  completed  in  between  6  and  9  days,  with  a  median  value  of  about  7  days. 


4  Most  of  this  study  assumed  a  time-terminated  free-play  OT. 


5 


Encounters  and 
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Figure  3.  Time  Savings  Due  to  the  Use  of  Test  Control  Rules: 
Number  of  Days  Required  for  a  15-Event  OT  of  System  B 


With  respect  to  the  search-related  MOEs  that  were  investigated: 

•  In  the  case  of  SR  and  MdSR,  stopping  rules  of  16  and  20  hours  appeared 
to  represent  a  reasonable  variance-reducing/realism-maintaining 
compromise.  The  increased  sample  sizes  associated  with  using 
16-  and  20-hour  stopping  rules  (relative  to  no  stopping  rule)  led  to 
decreased  variance  associated  with  the  estimates  of  these  MOEs.  The 
longer  stopping  rule  that  was  examined,  24  hours,  led  to  fewer  observed 
detections/  classifications,  particularly  for  System  B,  and  hence,  greater 
variance  in  the  estimation  of  search-related  MOEs.  As  discussed  above, 
the  shorter  stopping  rule,  12  hours,  often  led  to  violations  of  our  test 
realism  rule. 

•  The  MSR,  because  of  the  large  variance  associated  with  its  estimation, 
does  not  appear  to  be  a  good  choice  for  a  search-related  MOE. 

•  Given  the  employment  of  the  test  control  rules  described  in  this  document, 
both  MdSR  and  SR  appear  to  represent  satisfactory  search-related  MOEs. 
Whereas  MdSR  can  be  directly  estimated  from  the  observed  events,  an 
MLE  procedure  should  be  used  to  include  censored  data  in  estimates  of 
SR. 
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•  Given  a  “set  of  observations”  (trial),  a  parametric  bootstrap  technique  can 
be  used  to  estimate  the  given  search-related  MOE  and  to  attach  confidence 
intervals.  In  addition,  in  the  case  of  MdSR  and  SR,  this  technique  can  be 
used  to  arrive  at  statistically  based  conclusions  (e.g.,  hypothesis  testing) 
relative  to  predefined  thresholds.5 

D.  CONCLUSIONS 

Given  the  set  of  test  control  rules  examined  and  for  free-play  ASW  search  OTs 
conducted  on  systems  with  performance  expected  to  be  similar  to  that  of  systems  recently 
tested  by  the  Navy,  the  16-  to  20-hour  stopping  rules  appear  best.  For  example,  for  the 
16-hour  stopping  rule,  if  16  hours  pass  after  the  start  of  the  search  without 
detection/classification  of  the  target,  the  OTD  should  stop  the  trial  and  proceed  to  the 
next  planned  trial.  If  this  period  of  time  without  detection/classification  passes  again  on  a 
different  trial  during  the  OT,  the  area  size  to  be  searched  should  be  decreased  by  25  or  50 
percent.  Additional  OTD-directed  truncations  should  be  followed  by  further  area 
shrinkages. 

The  examination  of  two  “systems,”  A  and  B,  whose  performance  varied  by  a 
factor  of  two,  demonstrated  the  robust  behavior  of  the  identified  test  control  rules.  That 
is,  the  same  rules  were  used  in  either  case  (System  A  or  B),  yet  acceptable  numbers  of 
encounters  and  expected  levels  of  realism  were  maintained. 

Given  the  usage  of  test  control  rules,  the  search-related  measures,  SR,  defined  as 
the  area  searched  divided  by  the  average  time  to  detect/classify,  and  MdSR,  defined  as 
the  area  searched  divided  by  the  median  time  to  detect/classify,  could  be  used  as 
high-level  MOEs  to  aid  assessments  of  system  search  effectiveness  and,  in  particular,  for 
comparisons  to  predefined  thresholds  or  previous  system  performance. 


5  Parametric  bootstrap  techniques  use  the  structure  of  an  assumed  specific  underlying  distributional 
model.  Alternatively,  given  such  a  construct,  large-sample  approximations  based  on  the  same 
parametric  formulation  could  also  be  used  to  construct  statistical  confidence  intervals.  Nonparametric 
approaches,  including  the  nonparametric  bootstrap,  offer  other  possible  confidence  interval 
approaches. 
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CHAPTER  I 


INTRODUCTION 


I.  INTRODUCTION 


A  previous  IDA  study  (Ref.  1-1)  examined  Operational  Test  and  Evaluation 
(OT&E)  concepts  for  attack  submarines.  This  previous  effort  suggested  that  the 
operational  effectiveness  and  suitability  of  an  attack  submarine  be  evaluated  as  a  function 
of  its  specific  mission  and  that  a  disciplined,  logical  approach  that  uses  a  framework 
composed  of  assessments  of  critical  operational  issues  (COIs)  be  adopted.  Five  attack 
submarine  COIs  were  identified:  COVERTNESS,  SEARCH,  ATTACK, 
CONNECTIVITY,  and  AVAILABILITY.1  Assessments  of  these  COIs  were  meant  to 
proceed  via  the  estimation  of  a  few  high-level  measures  of  effectiveness  (MOEs)  and  a 
large  number  of  measures  of  performance  (MOPs).  Furthermore,  the  study 
recommended  that  a  comparative  evaluation  methodology  be  applied  at  all  levels.  For 
example,  estimates  of  MOEs  and  MOPs  for  a  given  submarine  (or  submarine  system) 
would  be  compared  to  Navy-defined  thresholds  and  to  estimates  of  current  system 
performance.  These  estimates  of  current  system  performance  might  come  from  an 
analysis  of  past  test,  exercise,  or  operations  data  or,  preferably,  from  “side-by-side” 
operational  testing  (Ref.  1-2).  The  simulation  study  described  in  this  paper  is  associated 
with  the  Anti-Submarine  Warfare  (ASW)  mission  and  the  SEARCH  COI.2 

A.  NEED  FOR  TEST  REALISM 

Previous  operational  testing  of  the  submarine  or  sonar  search  capability  has  not 
always  been  as  realistic,  and  hence  useful  to  decision  makers,  as  one  might  like.  Often, 
in  the  past,  ASW  OTs  (and  exercises)  have  involved  a  repetitive,  forced  encounter 
design.  These  forced  encounter  geometries  are  designed  to  increase  the  probability  of 
detection  in  a  “reasonable”  length  of  time.  Typically,  the  target  ship  and  test  platform 
(i.e.,  the  searcher)  are  placed  at  different  corners  of  a  small  box,  perhaps  an  underwater 
acoustic  range,  and  the  target  ship  is  required  to  follow  a  given  track  (or  be  at  “tie  points” 


1  The  “all  capitals”  notation  was  used  in  Reference  1-1  to  designate  the  word  as  meaning  the  COI. 

2  MOEs  and  MOPs  to  support  assessments  of  the  ASW  SEARCH  COI  have  been  identified  (Ref.  1-1). 
A  key  search-related  MOE.  which  is  being  used  by  both  the  New  Attack  Submarine  and  Seawolf 
programs  to  aid  ASW  assessments,  is  search  rate,  defined  in  units  of  square  nautical  miles  per  hour.  A 
detailed  definition  for  search  rate  is  given  in  Chapter  II  of  this  document. 
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at  specific  times)  starting  at  the  commencement  of  the  exercise  (COMEX).  Depending 
on  the  circumstances,  the  test  platform  may  be  totally  free  to  maneuver  or  be  given  a 
track  to  follow.  Sonar  contact  on  the  part  of  either  ship  may  permit  either  to  maneuver  at 
will.  This  design  has  been  used  to  enhance  the  probability  that  the  encounter  will  occur 
in  the  center  of  the  box  (or  range)  where  weapons  can  be  fired  and  more  easily  recovered. 

This  repetitive  forced  encounter  design  can  lead  to  unrealistic  conditions.  For 
instance,  Commanding  Officers  are  not  free  to  operate  their  submarines  in  the  manner  in 
which  they  would  employ  them  in  real  combat  and  they  may  be  confined  to  following 
demotivating  “rudder  orders”  (particularly  the  target).  If  a  forced  encounter  geometry  is 
used  to  replicate  an  area  clearance,  a  typical  ASW  search  scenario,  only  the  bow  aspects 
of  radiated  noise  are  important,  and  there  are  generally  no  detections  on  opening 
geometries.  This  method  does  not  yield  stern  aspects,  and  thus  fails  to  test  the  ability  of 
the  test  platform  to  “catch  up”  and  exploit  any  proposed  tactical  speed  enhancements. 
Furthermore,  the  sonarman  is  faced  with  ever-increasing  signal-to-noise  ratios  in  a  forced 
encounter  design.  There  is  a  suspicion  that  these  conditions  train  him  to  wait  when  in 
doubt  about  detection  or  classification,  expecting  conditions  to  improve. 

The  repetitive  COMEX-finish  exercise  (FINEX)  sequence  of  “alert”  periods  leads 
to  many  artificialities.  Sometimes,  following  COMEX,  additional  personnel  who  are 
more  experienced,  including  the  Commanding  Officer,  will  muster  at  watch  stations, 
thereby  enhancing  ship  performance.  Potentially  detectable  housekeeping  operations, 
such  as  air  charges,  sanitary  tank  blowing  (pumping),  and  dumping  garbage,  are  delayed 
until  after  FINEX.  The  worst  aspect  of  the  COMEX-FINEX  routine  is  the  periodic 
expectation  of  a  target  detection  (i.e.,  non-random,  somewhat  predictable,  times  from 
COMEX  to  detection). 

A  previous  study  of  a  surface  ship  sonar  OT  showed  that  measures  of  detection 
performance  (i.e.,  probability  of  detection,  percent  holding  time,  time  to  initial  detection) 
varied  widely  among  test  phases,  depending,  in  large  part,  on  how  highly  structured  the 
particular  test  phase  was.  For  example,  the  observed  probability  of  detection  during  the 
highly  structured  BARSTUR/BSURE3  range  phase  was  twice  the  value  of  that  observed 
during  the  open  ocean  relatively  “free  play”  test  phase  conducted  in  the  mid-Atlantic 
(Refs.  1-3  through  1-6).  During  the  highly  structured,  forced  encounter-like  phase,  half  of 
the  initial  detections  came  within  16.5  minutes  of  COMEX!  Similarly,  during  the  range 


3  BARSTUR  =  Barking  Sands  Tactical  Underwater  Range,  BSURE  Barking  Sands  Underwater  Range 
Expansion. 
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phase  of  the  AN/BSY-1  Operational  Evaluation  (OPEVAL),  detections  were  generally 
called  within  a  few  minutes  of  COMEX  (Refs.  1-7  and  1-8).  This  latter  OPEVAL  also 
illustrated  the  problem  of  trying  to  test  a  long-range  sensor,  in  this  case  the  TB-23  towed 
hydrophone  array,  on  a  relatively  confined  underwater  acoustic  range. 

In  addition,  there  are  some  cases  in  which,  as  excessive  time  elapses  after 
COMEX  but  before  the  predefined  FINEX,  the  intensity  of  the  sonar  search  rises  to  a 
maximum  until  the  target  is  found.  This  artificial  operator  vigilance  has  been  observed  in 
several  OTs. 

With  respect  to  the  operational  testing  of  the  ASW  search  mission,  we  have 
recommended  (Ref.  1-1)  that  tests  be  conducted  in  the  open  ocean,  “round-the-clock” 
(i.e.,  24  hours  per  day)  and  designed  to  allow  for  free  play  of  both  the  target  and  the 
searcher  (test  platform).  For  instance,  the  target  would  be  given  a  realistic  threat  mission 
for  the  particular  scenario  under  test  and  not  simply  be  playing  the  target  and  “waiting  to 
get  shot.”  We  concluded  that  conducting  operational  tests  of  sonars/submarines  in  this 
way  and  using  a  comparative  evaluation  strategy  while  measuring  many  MOPs  would 
produce  a  credible  assessment;  thus  the  OT  would  represent  a  useful  tool  for  the 
appropriate  decision  maker. 

B.  MOTIVATION  FOR  THIS  STUDY 

Efficiently  designing  a  practical  free-play,  24-hour-per-day,  open-ocean  OT  of  an 
ASW  search  system  (e.g.,  submarine  or  sonar)  represents  a  challenge  for  the  OT 
community.  A  similar  challenge  exists  for  the  training  and  tactics  development 
communities.  In  the  case  of  an  OT,  the  operational  test  director  (OTD),  typically  a 
lieutenant  from  OPTEVFOR,  will  have  the  additional  responsibility  of  controlling  this 
test  so  that  a  maximum  of  information  is  obtained,  in  as  realistic  a  fashion  as  possible. 
The  OTD’s  test  control  decisions  must  be  made  at  sea,  in  real-time,  typically  while 
underway  on  the  searching  submarine,  and  with  little  communication  to  outside  testers  or 
operators. 

In  the  past,  OTDs  have  often  been  concerned  with  collecting  enough  data,  that  is, 
having  enough  detections  and  encounters  on  which  to  base  their  assessments.  Of  course, 
the  previously  discussed  forced  encounter  test  design  can  alleviate  this  concern  but  at  the 
great  cost  of  test  realism  and,  ultimately,  evaluation  credibility.  It  is  not  unusual  that, 
early  in  an  OT,  little  sonar  contact  is  held,  perhaps  going  more  than  a  day  without  a 
detection.  This  lack  of  detection  becomes  of  great  concern  to  the  OTD.  The  OTD  does 
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not  want  to  return  to  OPTEVFOR,  after  “spending”  all  of  his  test  resources,  to  report  that 
only  one  or  two  trials  (detections  and  encounters)  were  accomplished  during  his  OT. 
This  small  sample  size  might  preclude  COMOPTEVFOR  from  resolving  system  COIs 
and  issuing  a  final  report.  That  is,  more  OT  would  likely  be  required. 

At  this  point  in  the  OT  (i.e.,  after  some  long  period  without  a  detection),  artificial 
means  of  detecting  the  target  submarine  are  often  introduced  by  the  OTD.  Perhaps 
augmentation  to  increase  the  radiated  noise  levels  is  turned  on  or  increased  to 
unrealistically  loud  levels.4  If  the  run  had  not  been  a  free-play  event  and  the  OTD  knew 
the  approximate  location  of  the  target,  “hints”  might  be  given  (e.g.,  try  looking  to  the 
southeast”).  Targets  have  been  known  to  introduce  sound  shorts,  transmit  active  pings 
(in  the  hope  of  being  detected),  and  create  artificial  transients  (“go  bang  a  hammer  on  the 
torpedo  tube  door”)  to  aid  the  searcher’s  classification  efforts.5  Obviously,  these 
methods  of  speeding  up  the  detection/classification  process  are  unrealistic  and  render  the 
search-related  data  that  are  collected  of  little  value. 

The  above  concern  with  having  enough  “trials”  seems  to  be  associated  with  a 
somewhat  misguided  approach  to  dealing  with  the  inherent  variance  associated  with  the 
search  process.  The  belief,  apparently,  is  that,  given  enough  trials,  reasonable  average 
values  of  MOEs  (e.g.,  probability  of  detection)  can  be  obtained.  Of  course,  there  are 
many  causes  of  the  variance  associated  with  estimates  of  search-related  MOEs.  The 
acoustic  environment,  crew  performance,  and  searcher  and  target  tactics  can  all 
contribute  to  this  variance.  One  cannot  reasonably  expect  to  do  the  hundreds  or  even 
thousands  of  trials  that  would  be  required  to  sort  out  or  average  out  all  of  the  potential 
causes  of  variance.  The  current  approach  seems  to  consider  the  causes  of  this  variance  as 
somehow  random  processes  that  are  out  of  our  control  and,  furthermore,  of  no  particular 
interest. 

We  have  suggested  a  new  approach.  Advances  in  information  technology  have 
led  to  the  possibility  of  recording  far  more  data  in  an  unintrusive  manner  during  the  OT. 
For  instance,  high-density  digital  recorders  can  collect  raw  acoustic  data,  and  built-in 
automatic  data  loggers  can  keep  track  of  sonarman  actions.  These  advances  can  allow 
individual  searches  (i.e.,  trials)  to  be  much  more  valuable  to  the  tester.  Instead  of  simply 


4  The  use  of  externally  mounted  augmentation  devices  has  led  to  numerous  problems  that  have 
confounded  the  results  of  both  the  AN/BSY-1  and  AN/BQQ-5D  OPEVALs  (Ref.  1-9  and  references 
therein). 

5  For  instance,  see  page  5  of  Ref.  1-10. 
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estimating  one  or  two  high-level  MOEs,  many  MOPs  can  be  estimated.  For  instance, 
“sonar  equation  parameters”  such  as  recognition  differential  can  be  estimated  from 
acoustic  recordings  and  provide  insight  into  system  performance.  The  conclusion  is  that 
fewer  trials,  but  with  maximum  realism  and  more  data  collection,  can  provide  for  a  far 
more  insightful  and  credible  assessment  than,  say,  twice  as  many  unrealistic  trials.6 

Even  given  the  above  expectation  that  fewer  well-analyzed  realistic  trials  are 
more  valuable  than  larger  numbers  of  unrealistic  trials,  the  OTD  will  still  need  a 
mechanism,  or  rules,  to  allow  some  measure  of  control  over  the  average  length  of  time 
that  an  encounter  lasts.  Controlling  this  average  length  of  time,  from  on-board  the 
searching  platform  during  the  actual  OT,  while  maintaining  elements  of  operational 
realism  represents  a  potential  obstacle  to  the  employment  of  free-play  OT  designs. 

On-board  the  searching  platform,  often  a  submarine,  the  OTD  may  have  accurate 
knowledge  of  very  few  parameters.  For  instance,  detection  ranges  will  not  necessarily  be 
accurately  known  until  weeks  after  the  OT  when  the  search  platform/target  position 
reconstructions  are  complete.  The  time  from  COMEX  to  detection/classification 
represents  one  of  the  few,  perhaps  only,  sources  of  information  that  are,  in  general, 
relatively  accurately  known  in  real-time  by  the  OTD.  It  is  these  times,  and  the 
information  that  we  can  extract  from  them,  that  we  propose  as  a  basis  for  our  practical 
test  control  rules.  This  paper  identifies  practical  test  control  rules  that  an  OTD  can 
employ  from  on-board  the  searching  platform  to  allow  for  an  efficient  free-play, 
open-ocean,  24-hour-per-day  OT.  In  addition,  this  paper  characterizes  some  of  the 
properties  of  these  test  control  rules. 

The  basic  idea  is  to  stop  the  test  if  the  time  without  a  detection/classification 
grows  too  long.7  Furthermore,  if  this  long  period  passes  again  without  detection/ 
classification,  then  the  OTD  needs  to  use  a  different  search  scenario  in  which  the  size  of 
the  area  being  searched  is  shrunk.  This  process  of  observing  the  times  to 
detection/classification,  and  then  deciding  whether  to  continue  to  the  next  search  as 
previously  planned,  stop  (“truncate”)  the  current  trial,  or  truncate  the  current  trial  and 
shrink  the  next  box  searched  can  be  used  to  control  ASW  search  OTs.  What  these 


6  Of  course,  an  infinite  number  of  totally  unrealistic  trials  would  be  of  no  value,  statistical  or  otherwise. 

7  This  period  of  time  (to  detection/classification)  might  be  longer  than  expected  for  several  reasons.  For 
example,  the  sonar  system  may  not  be  as  effective  as  previously  believed,  the  crew  may  not  have  been 
fully  trained,  the  target  may  be  quieter  than  expected,  or  the  acoustic  environment  (noise  field  and 
transmission  loss)  may  be  more  challenging  than  predicted. 
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“sequential  test  control  rules”  should  be  -  for  instance,  how  long  until  stopping  the  trial 
and  how  much  should  the  box  be  shrunk  -  is  the  subject  of  much  of  this  paper. 
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II.  METHODOLOGY 


In  this  chapter,  we  describe  the  procedures  followed,  assumptions  made,  and 
calculations  performed,  in  order  to  develop  and  study  test  control  rules  for  ASW  search 
OT.  Basically,  we  simulate  an  operational  test  of  an  ASW  search  system  for  the 
“one-on-one”  case  (i.e.,  submarine  versus  submarine).1  We  develop  a  model  for  the  times 
to  detection/classification  for  a  free-play,  24-hour-per-day,  open-ocean,  ASW  search  OT. 
Next,  we  use  this  model  to  generate  pseudo-random  variates  to  serve  as  realistic  times  to 
detection/classification.  These  times,  with  the  reasonable  addition  of  time  for 
localization,  attack,  and  repositioning  after  the  encounter,  are  then  summed  until  the 
desired  number  of  test  days  are  used  up.  All  along,  we  follow  potential  test  control  rules 
that  define  when  the  OTD  should  stop  a  trial  and/or  “shrink”  the  search  area  and  by  how 
much. 

Our  discussion  begins  with  a  brief  review  of  idealized  random  search  theory. 

A.  RANDOM  SEARCH  THEORY 

When  a  ship  or  submarine  is  conducting  a  random  search  of  an  area  for  a  target 
whose  location  is  uniformly  distributed  over  the  ocean,  the  time  between  detections  will 
be  exponentially  distributed  with  a  mean  time  to  detection  (1A)  dependent  upon  the 
searcher’s  effective  detection  range  and  the  target  and  searcher  speeds.  For  a  given 
search  time  (t),  the  probability  distribution  function  is  given  as: 

=  ^  \  Equation  II- 1 

This  “exponential  detection  law”  was  described  in  detail  by  B.  O.  Koopman  in 
1946  (Ref.  II- 1).2  In  addition  to  a  uniform  distribution  of  target  density,3  this  model 
assumes  that  the  relative  motions  of  the  searcher  and  target  are  random  (e.g., 
uncorrelated)  and  that  the  size  of  the  area  is  much  larger  than  the  effective  detection 


1  This  methodology  could  be  extended  to  the  “one-on-many”  or  “many-on-one”  cases. 

2  The  searcher’s  performance  (e.g.,  detection  range)  and  the  target  and  searcher  speeds  are  “wrapped  up” 
within  X. 

3  That  is.  the  target’s  location  is  equally  likely  anywhere  within  the  area  to  be  searched. 
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range  of  the  search  platform.  If  the  target  can  detect  the  searcher  before  the  searcher  can 
detect  the  target,  then  target  motion  can  become  correlated  (e.g.,  the  target  may  choose  to 
avoid  the  searcher),  and  this  model  will  not  necessarily  be  a  good  representation  of  the 
search  process.  If  the  search  area  and  effective  detection  range  are  of  similar  sizes,  then 
the  searcher  will  be  “looking”  in  areas  outside  of  the  defined  search  area  during  some 
periods  of  his  random  search,  areas  where  target  density  is  necessarily  zero,  and,  again, 
this  exponential  search  model  may  be  of  limited  value.4  With  respect  to  mission  and  test 
planning,  these  assumptions  have  often  been  acceptable,  and  random  search  theory  has 
been  widely  applied.5 

Koopman  expresses  the  cumulative  probability  of  detection  (Prob[detect  <  t]) 
when  the  searching  is  done  continuously  under  unchanging  conditions  as: 

-(— ) 

CumP(W, L)  =  \-eyA  Equation  II-2 

where  W  =  the  effective  sweepwidth  (2  x  effective  detection  range6),  L  =  total  length  of 
the  observer’s  path  (relative  speed  x  search  time),  and  A  =  area  size.  By  setting  search 
rate  (SR)  =  W  x  relative  speed,  we  can  restate  this  equation,  for  a  given  search  time,  as: 

fSRxi\ 

CumP(SR)  =  1  —  e  4  .  Equation  II-3 


Furthermore,  if  SR  is  defined  as  the  size  of  the  area  to  be  searched  divided  by  the  average 
time  to  a  detection,7  then  a  natural  estimate  of  the  cumulative  probability  of  detection  can 
be  based  on  the  observed  times  to  detection  as  follows: 


r  \ 


t 


EstCumP(t)  =  1  —  e 


Equation  II-4 


4  This  phenomenon  is  sometimes  referred  to  as  an  “edge  effect.” 

5  A  few  studies  familiar  to  the  authors  are  given  in  Refs.  11-2  through  11-5.  Undoubtedly,  there  are  many 
more  applications  of  random  search  theory. 

6  This  effective  detection  range  is  based  on  the  simplified  concept  of  a  “  definite  range  law  detector.” 
(See  Ref.  1-1,  page  20.)  Often  the  sonar’s  computed  median  detection  range  (MDR),  based  on  an 
estimation  of  the  sensor's  acoustic  figure-of-merit  (FOM)  and  the  environment-specific  transmission 
loss  curve  (as  a  function  of  range),  is  used  as  a  surrogate  for  effective  detection  range. 

7  This  formulation  of  search  rate  and  its  relationship  to  sweepwidth  has  been  previously  discussed  (Refs. 
II-l,  II-6,  and  II-7). 
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In  Equation  II-4,  n  =  the  number  of  observed  detections.  Figure  II- 1  shows  this  classic 
formulation  of  the  cumulative  probability  of  detection  for  one  value  of  A. 


Cumulative 

Probability 


Search  Time  (Hours) 


Figure  11-1.  Classic  Exponential  Random  Search  Model 

Since  A  can  be  estimated  as  SR/ A,  variations  in  A  (or  SR)  lead  to  linear  changes 
in  SR  (or  A)  for  a  given  constant  A.  Note  that  1/A  is  the  mean  time  to  detection.  The 
implication,  for  instance,  is  that  doubling  the  area  size  (for  a  given  sensor)  or  halving  the 
searcher  performance  (i.e.,  search  rate)  will  double  the  value  of  1/A.  Therefore,  doubling 
the  area  size  to  be  searched  without  improving  the  sensor  leads  to  a  mean  time  to 
detection  that  is  twice  as  long  as  the  case  for  the  initial  area  size  (for  this  ideal  case). 
Figure  II-2  illustrates  the  effect  of  changes  in  A  on  the  cumulative  probability  curves. 
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Figure  11-2.  Cumulative  Probability  of  Detection  as  a  Function  of  X  (=  SR/A) 
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B.  OBSERVATIONS  FROM  A  RECENT  OPERATIONAL  TEST 


This  section  describes  some  results  from  a  24-hour-per-day,  open-ocean, 
relatively  free-play  operational  test  of  a  submarine  sonar  that  was  conducted  in  1994. 8 
After  developing  a  conceptual  model  for  the  times  to  detection/classification,  in  the  next 
section,  these  1994  OT  results  will  be  used  to  “fit”  a  specific  model.  Table  II- 1  presents 
some  of  the  search-relevant  results. 


Table  11-1.  Measured  Search  Durations  and  Search  Rates 


Trial  # 

Search  Duration  (tj) 

(minutes) 

Area  Size  (a*)  (NM2) 

Search  Rate  (sr(  =  a,/t|) 
(NM2/Hr) 

1 

251 

400 

95.6 

2 

23 

400 

1,043.5 

3 

403 

400 

59.6 

4 

157 

400 

152.9 

5 

104 

400 

230.8 

6 

354 

400 

67.8 

7 

326 

400 

73.6 

8 

242 

400 

99.2 

9 

124 

800 

387.1 

10 

115 

800 

417.4 

11 

1,332 

800 

36.0 

12 

309 

800 

155.3 

13 

298 

800 

161.1 

The  measured  search  duration  is  best  characterized  as  a  time  from  COMEX 
to  detection/classification.  Importantly,  the  times  to  “detection”  are  not  necessarily 
known.  Rather,  the  time  at  which  a  signal  or  signals  are  classified  to  the  degree  that 


8  Some  classified  details  of  this  testing  can  be  found  in  Reference  II-8  and  references  therein. 
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some  action  by  the  search  platform  can  occur  may  be  the  more  relevant  measure  of  ASW 
search  effectiveness.  Often,  detection  and  classification  appear  to  occur  at  the  same  time. 
One  can  imagine  a  signal  being  “detected”  by  the  sonarman  or  perhaps  his  computer,  but 
not  recognized  as  a  target  of  interest  until  some  later  time.  The  times  in  Table  II- 1 
represent  the  times  to  detection/classification  for  those  detections/classifications  that 
turned  out  to  be  of  the  submarine  of  interest  (i.e.,  the  target).  In  our  first  departure  from 
the  ideal  “random  search”  case,  we  recognize  that  the  measured  search  durations  actually 
represent  the  time  from  COMEX  to  a  detection/classification  (vice  simply  the  time  to 
detection).  We  justify  further  use  of  random  search  theory  by  arguing  that  an  analogy  to 
the  definite  range  law  detector,  a  “definite  range  law  classifier”  (effective  classification 
sweepwidth)  exists,  with  all  the  same  attendant  limitations  and  assumptions  (as  described 
above). 

Table  II-2  presents  the  average  time  to  detection/classification  based  on  the 
observed  data.9  The  average  time  to  detection/classification  appears  to  be  about  twice 
(within  7  percent  of  twice)  as  long  in  the  800  NM2  box  as  in  the  400  NM2  box.  This 
suggests  that,  for  these  operational  test  data,  the  linear  relationship  (discussed  earlier) 
between  searcher  performance  and  area  size  holds.  (For  instance,  edge  effects  did  not 
appear  to  be  important  for  these  data.)  Assuming  that  this  linear  relationship  roughly 
holds,  we  have  normalized  the  times  to  detection/classification  (t,)  for  an  800  NM2  area 
size  and  present  the  average  search  duration  for  all  thirteen  observations  in  the  last 
column. 


Table  11-2.  Average  Time  to  Detection/Classification  (hours) 


Parameter 

Area  Size  =  400  NM2 

Area  Size  =  800  NM2 

All  (Normalized  at  800  NM2) 

Sample  Size 

8 

5 

13 

Average  {t,} 

3.88 

7.26 

7.56 

Several  measures  of  search  effectiveness  can  be  computed  from  the  data  shown  in 
Table  II- 1.  Three  of  these  measures,  search  rate  (SR),  median  search  rate  (MdSR),  and 
mean  search  rate  (MSR),  are  defined  below  and  presented  in  Table  D-3. 


9  The  inverse  of  this  average  time  to  detect/classify  can  be  thought  of  as  an  estimate  of  the  searcher’s 
characteristic  “detection/classification  rate”  (for  a  given  area  size  or,  alternatively,  target  density). 
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A 


Equation  II-5a 


MdSR  =  Medianl— \  Equation  Il-5b 

1  h  h  h  f n  I 

MSR  =  —  V  —  Equation  II-5c 

n  m  h 

A  =  the  area  size  of  interest  (e.g.,  to  be  normalized  to),  a,  =  the  size  of  the  ith  area 
searched,  t,  =  the  time  from  COMEX  to  detection/classification  of  the  ith  search,  and  n  = 
the  number  of  searches.  Note  that  the  calculation  of  SR  generally  requires  the  (linear) 
normalization  (“A/a”  factor  in  Equation  Il-5a)  of  times  to  detect/classify  if  the  area  sizes 
are  changed  during  the  OT. 

Table  11-3.  Estimates  of  Search  Effectiveness  for  13  Observations 


Parameter 

Value  (NM2/Hr) 

SR 

105.8 

MdSR 

152.9 

MSR 

229.2 

Given  the  above  estimates  of  search  effectiveness  measures  and  the  13  observed 
times  to  detection/classification,  one  can  compute  approximate  confidence  intervals  (or 
alternatively,  conduct  specific  hypothesis  tests).  Conventional  parametric  techniques  for 
computing  confidence  intervals  require  that  the  observations  correspond  to  a  random 
sample  from  a  known  probability  distribution  and  that  the  exact  sampling  distribution  for 
the  given  test  statistic  (e.g.,  mean,  median,  maximum)  is  known.  The  latter  requirement, 
for  the  exact  sampling  distribution,  is,  in  general,  not  possible.  However,  using  the  13 
observations  of  t,,  one  can  compute  approximate  confidence  intervals  for  the  above 
estimates  using  resampling  techniques  such  as  the  bootstrap. 

To  generate  “nonparametric  bootstrap”  confidence  intervals,  we  first  randomly 
draw  (with  replacement)  13  t,'s  from  the  observed  sample.  From  these  13  t,’s,  or  bootstrap 
sample,  we  compute  the  parameters  of  interest,  in  this  case,  SR,  MdSR,  and  MSR.  We 
repeat  this  process,  in  this  case  2,000  times,  and  then  order  these  bootstrap  estimates  of 
the  particular  measures  of  search  effectiveness.  An  approximate  confidence  interval  of 
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oc%  is  represented  by  the  endpoints  of  the  middle  2,000  x  a  bootstrap  estimates.  This 
technique  is  referred  to  as  the  “percentile  interval”  method. 10 

Table  II-4  presents  these  nonparametric  bootstrap  (approximate)  confidence 
intervals  for  the  five  measures  of  Table  II-3  (for  a  values  of  80,  90,  and  95%).  The 
interval  lengths  associated  with  SR  are  the  shortest.  The  interval  lengths  associated  with 
MSR  are  the  longest  (3  times  that  of  SR).  The  80%  confidence  interval  length  associated 
with  MdSR  is  similar  to  that  for  SR.  However,  the  90%  and  95%  interval  lengths  are 
about  1.5  times  longer  for  MdSR  vice  SR.  These  results  suggest  that  estimates  of  SR, 
and  to  a  degree  MdSR,  will  have  lower  variance  than  MSR. 


Table  11-4.  Nonparametric  Approximate  Confidence  Intervals  (C.l.) 


%C.I. 

SR 

MdSR 

MSR 

80 

83.5  - 145.0 

95.6-161.1 

141.4-327.2 

90 

77.7-157.8 

73.6  -  230.8 

127.6-360.5 

95 

73.4-173.3 

73.6  -  230.8 

116.5-390.1 

C.  DEVELOPMENT  OF  A  TIME  TO  DETECTION/CLASSIFICATION 
MODEL  FOR  AN  ASW  SEARCH  OT 

Significantly,  there  was  one  feature  of  this  1994  operational  test  that  did  not 
satisfy  the  assumptions  of  ideal  random  search  theory.  After  each  encounter,  the  OTD, 
via  the  test  plan,  repositioned  the  searcher  and  target  to  the  starting  points  for  the  next 
trial.  These  starting  points  (locations  at  COMEX)  were  designed  such  that  they  would 
place  the  searcher,  necessarily,  over  the  acoustic  detection/classification  horizon. 
Therefore,  no  detections  could  occur  at  the  shortest  time  intervals.  Rather,  there  must  be 
some  minimum  time  delay  (perhaps  when  the  searcher  and  target  are  heading  directly  at 
each  other  at  their  highest  speeds)  until  the  target  gets  within  the  searcher’s 
detection/classification  range  (e.g.,  definite  range  law  classifier).  We  might  envision  this 
situation  as  still  leading  to  an  exponential  distribution  of  times  to  detection/classification, 
but  with  some  initial  time  delay.  Equation  II-5  captures  this  delayed  exponential 
situation,  where  C  =  the  time  delay. 

F(f)  =  he C ^  for  t  >  C  Equation  II-611 


10  See  Ref.  II-9,  Chapter  13,  and,  for  additional  general  information  on  the  bootstrap.  Refs.  11-10  and 
11-11. 
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If  one  takes  A,  =  SR/ A  =  150/800,  as  in  Figure  II- 1,  and,  in  addition,  adds  a  1.0 
hour  time  delay  (C  =  1),  then  the  density  function  shown  in  Figure  II-3  results. 

0.20 
0.16 
0.12 

Probability 

0.08 
0.04 
0.00 

0  2  4  6  8  1  0 

Tim©  to  Detection/Classification  (Hr) 

Figure  11-3.  Delayed  Exponential  Model 

We  also  recognize  that  the  searcher’s  actual  detection/classification  performance 
may  vary  from  trial  to  trial.  Perhaps  the  environment  gets  louder  or  quieter,  or  the  target 
radiates  more  or  less  sound,  or  the  sonar  operator’s  recognition  differential  changes  from 
trial  to  trial  and  thus  impacts  the  searcher’s  detection/classification  range.  We  can  model 
this  variance  as  a  random  variable  added  to  the  mean  time  to  detect/classify  ( 1/A.  =  mean 
time).  For  this  examination,  we  chose  this  random  variable  from  a  Normal  distribution 
with  mean  (|i)  equal  to  zero  and  standard  deviation  equal  to  a.12 

Next,  one  might  consider  the  initial  time  delays  as  related  to  the  relative  target 
speeds,  the  initial  COMEX  separations  after  repositioning,  and  the  searcher’s 
detection/classification  range.  For  a  typical  OT,  the  separations  at  COMEX  will  vary 
with  each  OPTEVFOR  run  plan.  There  may  be  a  few  run  plans  that  have  the  searcher 
and  target  at  approximately  opposite  corners  of  the  box  (biggest  separation),  and  a  few 
that  put  the  players  on  opposite  sides  of  the  box,  and  still  more  that  place  them  just  over 
the  acoustic  horizon  (perhaps  across  some  imaginary  boundary).  These  variations  in 
repositioning  separations  can  be  modeled  as  a  finite  number  of  possible  repositioning 
separations.  In  the  present  case,  we  envision  three  separations:  long  range  (opposite 
corners),  medium  range  (opposite  long  sides  of  a  rectangular  box),  and  short  range 
(opposite  short  sides  of  a  rectangular  box).13  In  addition,  searcher  detection/classification 

1 1  This  delayed  exponential  is  also  referred  to  as  a  two-parameter  exponential  distribution. 

12  We  chose  a  Normal  distribution  in  order  to  represent  the  combination  of  the  many  factors  that  can  lead 
to  variance  in  system  detection/classification  time. 

13  For  a  40  NM  x  20  NM  (800  NM2)  rectangular  box,  one  can  imagine  separations  at  COMEX  of  about 
40  NM  (opposite  comers),  30  NM  (opposite  sides  (long)),  and  15  NM  (opposite  side  (short)).  At  a 
reasonable  maximum  relative  closure  speed  of  20  knots  (10  knots  searcher  and  10  knots  target),  this 
would  imply  minimum  time  delays  of  2.00,  1.50,  and  0.75  hours  for  the  three  separation  cases 
described  above,  respectively. 
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performance  (discussed  above)  can  lead  to  variance  in  the  expected  value  of  the  time 
delay  (C).  Hence,  this  variance  in  time  delay  can  be  modeled  as  a  random  variable  added 
to  a  nominal  time  delay,  with  the  nominal  time  delay  chosen  from  some  finite  set  of 
repositioning  separations.  Again,  a  Normal  distribution  with  |i  =  0  and  a  standard 
deviation  =  a  are  used  to  generate  the  random  variables  for  addition  to  the  nominal  time. 
Note  that  the  same  standard  deviation  (and  random  draw)  is  associated  with  both  the 
average  time  to  detect/classify  (1/A,)  and  the  time  delay  (C).  Thus,  1/A  and  C  are 
correlated  in  this  model.14  That  is,  P(t)  =  Xe  ^  ^  where  t  >  0,  1/A  =  1/Aa+  D,  C  = 
Q+  D,  and  D  is  a  random  variable  drawn  from  a  Normal  distribution  (0,  a2). 

Figure  II-4  illustrates  the  above  model  for  six  cases  (two  each  with  the  initial 
nominal  time  delay  (Q)  set  at  0.75,  1.50,  and  2.00  hours  -  see  footnote  10.  In 
Figure  II-4, 1/Aj  =  A/SR  =  800/150  =  5.33  and  a  =  0.5.  The  a  value  of  0.5  was  chosen 
solely  for  illustration. 

Model  Parameters 


{C>  [1A,] 


Time  to  Detection/Classification  (Hr) 
Figure  11-4.  Six  Delayed  (Two-Parameter)  Exponential  Distributions 


14  This  correlation  is  due  to  the  fact  that  the  searcher  performance  (or  detection/classification  range) 
affects  both  the  exponential  decay  and  the  time  delay.  For  instance,  increased  ambient  noise  would 
lead  to  a  decrease  in  the  expected  detection  range  and  affect  the  exponential  decay  term.  In  addition, 
this  same  increased  ambient  noise  and  concomitant  decreased  de  tection  range  would,  on  average,  cause 
the  initial  time  delay  to  be  longer  since  the  target  would  need  to  get  closer  to  the  searcher  before  a 
detection  could  be  made.  In  any  case,  removing  this  correlation,  that  is,  using  independent  random 
draws,  does  not  affect  the  outcome  of  our  argument  -  that  a  gamma  distribution  represents  a  good 
model  for  representing  the  times  to  detection/classification  expected  from  an  OT.  See  the  ensuing 
discussion. 
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The  impact  of  the  different  initial  searcher-target  separations  can  be  seen  (Figure 
II-4)  as  different  time  delays  to  the  onset  of  first  expected  time  to  detection/classification. 
In  addition,  the  variability  in  l/X  (the  mean  time  to  detect/classify)  can  be  seen  in  the 
different  decay  rates  of  the  six  curves.  We  might  think  of  any  given  search  OT  as  leading 
to  observed  times  “drawn”  from  a  combination  of  some  unknown  number  of  these 
delayed  exponential  curves.  Continuing  with  these  thoughts,  we  generated  30,000  times 
to  detect/classify  from  30  delayed  exponential  distributions  (prepared  as  described  above, 
i.e.,  ten  each  with  nominal  reposition  separation  times  of  0.75,  1.50,  and  2.00  hours). 
Figure  II-5  presents  the  resultant  histogram  of  the  combination  (or  equally  weighted 
mixture)  of  30  delayed  exponential  distributions. 


Number  of 
Random  Draws 


Time  to  Detection/Classification  (Hr) 

Figure  11-5.  Histogram  of  30  (xlOOO)  Random  Draws  from  a 
Combined  Delayed  Exponential  Distribution 


The  above  histogram  suggests  a  probability  density  function  for  the  times  to 
detect/classify  for  a  free-play  search  OT.  The  initial  time  delay  associated  with  this 
function  followed  by  a  relatively  smooth  increase  in  density  and,  finally,  by  essentially 
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exponential  decay  suggested  to  us  that  the  gamma  distribution  could  be  used  to  model 
these  times  to  detect/classify.  Using  a  mathematical  representation  or  model,  in  this  case 
a  gamma  distribution,  to  represent  these  times  to  detect/classify  is  a  way  to  simulate  a 
search  OT  (in  a  reasonably  efficient  manner). 

Intuitively,  one  can  imagine  the  times  to  detection/classification  (tjS)  distributed  in 
a  manner  analogous  to  the  rth  event  of  a  Poisson  process.  For  example,  first  the  target  s 
acoustic  energy  must  be  detected  and  displayed  by  the  sonar.  Next,  after  examining  this 
acoustic  energy,  enough  information,  perhaps  several  characteristic  tones  of  sufficient 
strength,  must  be  identified  by  the  sonarman  to  make  a  classification.  This  waiting  time 
to  the  rth  event  in  a  series  of  events  happening  in  accordance  with  a  Poisson  process  (at  a 
constant  rate  of  events  per  unit  time)  obeys  a  gamma  probability  law  (Ref.  11-12). 
Alternatively,  one  might  consider  the  t(s  as  arising  from  a  mixture  of  independent 
standard  exponential  variables  (Xp  ...,  Xr)  (perhaps  with  the  time  delays  and  time  decays 
arising  from  independent  exponential  distributions),  then  the  probability  density  function 
of  their  linear  combination  is  represented  by  a  general  gamma  (or  general  Erlang) 
distribution  (Ref.  11-13). 

Gamma  distributions  have  been  used  as  representations  of  many  physical 
situations.  In  particular,  they  have  been  used  to  make  realistic  adjustments  to  exponential 
distributions  in  representing  lifetimes  and  other  random  processes  in  time  (Ref.  11-13, 
page  343).  The  next  section  describes  the  gamma  distribution  and  provides  “fits”  to 
observations. 

D.  FITS  TO  A  GAMMA  DISTRIBUTION 

The  probability  density  function  for  a  gamma  distribution  is: 

-/ 

1  0<r<~  Equation  11-7 

where  a  is  a  shape  parameter  (a  >  0)  and  (3  is  a  scale  parameter  ((3  >  0).  The  gamma 
function  T(a)  =  (a-1)!.  Furthermore,  for  a  =  1,  the  gamma  distribution  reduces  to  the 
exponential  distribution  with  (3  =  1/A.  (Refs.  11-13  and  11-14).  When  a  >1,  the  gamma 
distribution  takes  on  the  basic  shape  shown  in  Figure  D-6. 
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Figure  11-6.  Shape  of  Gamma  Distribution  for  a  =  1.5  and  (3=5.33 

A  comparison  of  Figures  11-6  and  11-5  suggests  that  the  gamma  distribution  may 
be  adequate  for  modeling  the  times  to  detect/classify  of  a  free-play  search  OT  with 
repositioning  over  the  acoustic  horizon  after  each  encounter.  That  is,  the  combination  of 
several  delayed  exponential  distributions,  which  can  be  thought  of  as  contributing  to  the 
observed  times  to  detect/classify,  can  be  represented  (approximately)  by  a  single  gamma 
distribution. 

We  can  estimate  the  parameters  of  the  gamma  distribution  based  on  the  observed 
times  to  detect/classify  reported  in  Table  II- 1.  First,  the  “method  of  matching  moments” 
can  be  used  (Ref.  11-13).  For  this  technique,  the  average  and  variance  of  the  times  to 
detect/classify  are  related  to  the  parameters,  a  and  (3,  of  the  gamma  distribution  as  shown 

below . 

P'a'  =  sample  average  Equation  Il-8a 

P'2a’  =  sample  variance  Equation  II-8b 

The  primes  (')  on  a  and  (3  are  meant  to  distinguish  the  estimator  from  the  parameter. 
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Noting  that  the  average  time  to  detect/classify  is  7.56  hours  (normalized  to  an  800 
NM2  box)  and  the  variance  is  35.13  hours,15  one  can  compute  a'  and  (3'  by  simultaneously 
solving  the  two  equations  shown  above.  Following  this  method  leads  to  a'  =  1.63  and  (3' 
=  4.65. 

A  second  technique  to  estimate  the  parameters  of  the  gamma  distribution  is  to 
minimize  the  squared  differences  between  the  cumulative  gamma  distribution  and  the 
observed  cumulative  probability  of  detection/classification  as  a  function  of  time.  Figure 
II-7  plots  the  ordered  normalized  (to  800  NM2)  times  to  detect/classify  (from  Table  II- 1). 
This  provides  an  empirical  cumulative  probability  of  detection/classification  curve  as  a 
function  of  time.  Using  the  above  “least- squares”  approach,  one  can  solve  for  a'  and  |3'.16 
The  solid  line  in  Figure  II-7  corresponds  to  the  cumulative  gamma  distribution  fitted  in 
this  way. 


Figure  11-7.  Cumulative  Probability  Distribution  for  Times  to  Detect/Classify  (Normalized  to 
800  NM2)  and  Cumulative  Gamma  Distribution  (Fit  With  Least-Squares) 

A  final  technique,  widely  accepted  and  considered  theoretically  superior,  is  the 
method  of  maximum  likelihood  estimation  (MLE)  (Ref.  11-15).  In  this  case,  the 
likelihood  function  (L)  to  be  maximized  is 


15 


Sample  variance  is  computed  as 


n(n  - 1) 


where  n  =  the  number  of  observations. 


16  The  Microsoft  EXCEL ™  Solver  function  was  used  to  determine  iteratively  the  least-squares  solution.  In 
order  to  ensure  the  correct  shape,  a  was  constrained  to  be  greater  than  1. 


IM3 


Equation  II-9 


L(a,  fadata)  =  gfa ;  a,  p) 

i=i 

where  g  is  the  gamma  density  function  (Equation  II-7). 

Again,  one  can  iteratively  solve  this  equation,  and  obtain  estimates  of  a  and  (3. 17 
Table  II-5  presents  the  estimates  of  a  and  p  computed  by  the  three  methods  outlined 
above.  Figure  II-8  compares  the  three  “fits”  to  the  observed  data  via  a  plot  of  the 
cumulative  probability  of  detection/classification.  As  can  be  seen  in  Figure  II-8,  the 
MLE  and  matching  moments  methods  yield  very  similar  results  for  these  data. 18 


Table  11-5.  Estimates  of  a  and  p 


Method 

a' 

P* 

Matching  Moments 

1.63 

4.65 

Least-Squares 

1.41 

5.03 

MLE 

1.66 

4.54 

Figure  11-8.  Comparison  of  Various  Fits  to  a  Gamma  Distribution 


17  Microsoft  EXCEL™  Solver  was  used  for  the  iterative  calculations.  Again,  a  was  constrained  to  be 
greater  than  1. 

18  Note  that  the  observed  average  (normalized  to  800  NM2)  time  to  detect/classify  was  7.56  hours  (Table 
II-2).  The  MLE  estimate  of  the  mean  time  to  detect/classify,  P'a’  =  7.54  hours,  is  in  good  (<1  percent 
difference)  agreement. 
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An  additional  advantage  of  the  MLE  technique  is  that  censored  data,  for  instance, 
for  trials  in  which  the  OTD  stops  the  test  at  20  hours  without  a  detection/classification, 
can  be  included  in  the  likelihood  estimation.  A  modification  of  Equation  II-9  that 
incorporates  a  “survival  probability”  (g)  can  be  used  to  permit  censored  data 

observations  (Ref.  11-16): 

L{cc,p\data )  =  ^[g(^I;a,^)],,[G(m;a,j8)]1“^■ .  Equation  U-10 

1=1 

The  indicator  (I,)  is  as  follows:  I,  =  1  if  a  detection/classification  occurs  and  0  if 
the  trial  is  truncated  (i.e„  if  no  detection/classification  time  is  observed  because  of  trial 
truncation).  The  parameter  “m”  of  Equation  0-10  is  the  stopping  time  (e.g.,  20  hours  -  if 
no  detection/classification  is  observed  within  20  hours,  stop  the  trial).  The  survival 
probability  is  calculated  as  shown  below  with  g  being  the  gamma  density  function 
(Equation  D-7): 

G{m;a,  p)  =  ]  g{f  ;a,  P)df  Equation  O- 1 1 

m 

where  m  is  the  time  to  stopping  the  trial  without  a  detection/classification  (e.g.,  20  hours). 

Based  on  the  above  discussions,  we  chose  the  MLE  technique  for  all  further 
fitting  requirements.  For  example,  the  parametric  bootstrap  calculations  of  Chapter  III 
rely  on  this  method  (Equation  II- 10). 

It  is  important  to  acknowledge  that  the  model  of  times  to  detect/classify  described 
above  has  the  sole  purpose  (at  least  for  the  moment)  of  generating  realistic 
detection/classification  times  and  allowing  for  the  efficient  simulation  of  various  free 
play  test  control  rules  in  order  to  improve  future  ASW  search  OTs.  That  is,  after  the  OT 
is  over,  this  sort  of  model  may  be  of  little  value  in  gaining  insight  into  system  specific 
performance  (i.e.,  explaining  the  performance  of  the  past).  Rather,  the  myriad  sources  of 
data  (acoustic  recordings,  sonar  log  books,  searcher  and  target  position  reconstructions), 
which  should  be  available  after  the  OT,  offers  the  best  opportunity  to  examine  system 
capabilities  and  develop  refined  models  of  system  specific  performance.19 


19  Similar  sentiments  were  expressed,  somewhat  more  eloquently,  in  1946  (Ref.  II- 1,  page  34). 
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E.  CONFIDENCE  INTERVALS  USING  A  PARAMETRIC  BOOTSTRAP 

At  the  end  of  Section  B  of  this  chapter,  approximate  confidence  intervals  were 
computed  for  various  search  measures  using  a  nonparametric  bootstrap  technique.  Given 
the  arguments  of  Sections  C  and  D,  we  may  now  compute  “parametric  bootstrap 
estimates  (Ref.  II-9,  page  53)  of  these  confidence  intervals  by  randomly  drawing  times 
(t/s)  from  the  MLE-fitted  gamma  density  function  (a1  =  1.66  and  p  =  4.54).20  The 
confidence  intervals  computed  in  this  way  have  the  advantage,  relative  to  the 
nonparametric  bootstrap  intervals,  of  including  the  information  associated  with  the 
conduct  of  the  OT  that  led  to  the  general  shape  of  the  distribution  (i.e„  the  chosen  gamma 
distribution  model).  Using  the  data  from  Table  II- 1,  Table  II-6  compares  the  parametric 
(in  boldface)  and  nonparametric  bootstrap  confidence  intervals  for  three  parameters  of 
interest. 


Table  11-6.  Approximate  Confidence  Intervals  (C.I.): 
Parametric  (in  Boldface)  and  Nonparametric  Bootstrap 


%  C.l. 

SR 

MdSR 

MSR 

80 

82.9-143.5 

93.7-194.5 

139.0-399.5 

83.5-145.0 

95.6-161.1 

141.4  -  327.2 

90 

77.2  - 158.3 

85.0-221.1 

125.1  -544.3 

77.7-  157.8 

73.6  -  230.8 

127.6-360.5 

95 

73.2  - 168.3 

78.1  -  245.8 

115.1-687.4 

73.4-  173.3 

73.6  -  230.8 

116.5-390.1 

For  this  data  set,  the  parametric  and  nonparametric  techniques  appear  to  be  in 
good  agreement  for  the  SR  and  MdSR  measures.21  With  respect  to  MSR,  differences 
between  techniques  are  observed  at  the  higher  end  of  the  interval. 

The  parametric  bootstrap  percentile  confidence  interval  method  described  above 
will  be  referred  to  and  used  in  the  latter  sections  of  Chapter  III.  For  this  study,  we 


20  As  was  true  for  the  nonparametric  bootstrap.  2,000  bootstrap  samples  of  13  were  drawn  and  the 
various  search  measures  were  computed.  Again,  the  approximate  confidence  intervals  reported  here 
are  based  on  the  percentile  method. 

21  The  exception  to  this  "good  agreement’’  is  the  high  end  of  the  80  percent  MdSR  confidence  interval. 
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assumed  that  the  times  to  detect/classify  originated  from  a  gamma  distribution. 
Parametric  bootstrap  techniques  use  the  structure  of  an  assumed  specific  underlying 
distributional  model.  Hence,  we  chose  the  parametric  bootstrap  technique,  vice  the 
nonparametric  bootstrap  methodology,  so  as  to  include  this  additional  information. 
Given  such  a  construct,  large-sample  approximations  based  on  the  same  parametric 
formulation  could  also  be  used  to  construct  statistical  confidence  intervals.  If  one  chose 
to  reject  the  assumption  of  gamma-distributed  times  to  detect/classify,  nonparametric 
approaches,  including  the  nonparametric  bootstrap,  offer  other  possible  confidence 
interval  approaches. 

F.  SIMULATION  DESCRIPTION 

Using  the  previously  described  gamma  distribution  model  (a'  =  1.66  and  (V  = 
4.54)  for  times  to  detect/classify,  we  can  consider  simulating  a  free-play  search  OT  and 
examining  various  test  control  rules.  We  built  such  a  simulation  that,  given  a  random 
draw  of  times  to  detect/classify,  allows  one  to  determine,  for  instance,  for  a  test  of  a 
given  length,  the  number  of  encounters  that  would  be  expected  to  occur.22  To  each 
random  time  to  detect/classify,  a  period  of  time  to  represent  localization,  attack,  and 
repositioning  is  added.  The  cumulative  sum  of  these  times  is  kept  track  of  until  the 
defined  number  of  test  days  is  reached.  In  addition,  test  rules  are  embedded  in  the  test 
control  process.  For  example,  for  a  20-hour  stopping  rule  and  a  50  percent  shrinkage 
rule,  the  first  time  drawn  from  the  gamma  distribution  that  is  greater  than  20  hours  would 
be  truncated  at  20  hours  and  2  hours  for  repositioning  would  be  added.  The  second  time 
to  detection/classification  drawn  that  is  greater  than  20  hours  (during  the  same  test 
period)  would  again  be  truncated  (with  the  addition  of  repositioning  time),  but  this  time 
the  area  size  to  be  searched  on  the  next  trial  would  be  shrunk  by  50  percent.  Using  the 
roughly  linear  relationship  between  area  size  and  average  time  to  detect/classify 
described  earlier  (Table  11-2),  the  mean  times  to  detect/classify  would  also  be  shrunk  by 
50  percent  for  this  next  random  draw.  This  procedure  is  followed  until  the  number  of 
specified  test  days  are  used  up  and  all  trials  that  reach  FINEX,  either  via  a  truncation  or 
an  encounter,  are  saved. 

For  this  study,  four  stopping  rules  were  examined.  These  stopping  rules  (12,  16, 
20,  and  24  hours)  were  chosen  in  the  following  way.  First,  as  described  earlier,  we 
represented  the  times  to  detection/classification  by  a  gamma  distribution  with  a'  =  1 .66 


22  This  simulation  is  run  on  a  Microsoft  EXCEL™  spreadsheet. 


11-17 


and  P'  =  4.54.  Given  this  representation  of  times  to  detect/classify,  we  computed  a 
probability  of  0.184  for  the  detection/classification  occurring  at  a  time  greater  than  12 
hours.  Similarly,  one  computes  probabilities  0.088,  0.041,  and  0.019  for  times  of  16,  20, 
and  24  hours,  respectively.  These  rules  could  be  generalized  to  other  gamma  by  simply 
using  these  four  probabilities  to  recompute  the  stopping  rules.  For  example,  for  gamma- 
distributed  times  to  detect/classify  with  ot',|3'  =  1.5,  6.0  (i.e.,  mean  time  to  detect/classify 
=  a'p'  =  9.0  hours),  and  the  probabilities  given  above  (0.184,  0.088,  0.041,  0.019), 
truncation  times  of  14.5,  19.6,  24.8,  and  29.9  hours  are  calculated,  respectively. 

Next,  preliminary  studies  (Ref.  11-17),  in  which  10-day  OTs  were  simulated,  were 
initially  used  to  explore  these  stopping  rules  (and  the  shrinkage  rules).  The  goal  of  this 
“scoping  study”  was  to  assess  the  ability  of  these  rules  to  lead  to  area  shrinkage  when  the 
system  performed  worse  (half  as  well)  than  expected  (the  nominal  system  performance, 
System  A),  yet  not  to  shrink  the  area  when  nominal  system  performance  was  observed. 
Shrinking  the  area  after  the  first  observation  of  a  20-hour  truncation,  for  the  10-day  OT, 
led  to  frequent  shrinkage  when  nominal  system  performance  was  assumed.  Shrinking  the 
area  after  the  third  observation  of  a  20-hour  truncation,  again  for  a  10-day  OT,  was  not 
deemed  to  lead  to  enough  shrinkages  in  the  case  of  the  system  that  performed  at  half  the 
capability  of  the  nominal  system.  In  the  case  of  nominal  system  performance,  about  16 
percent  (17/107)  of  the  simulated  10-day  tests  led  to  shrinkage  when  the  shrinkage  rule 
was  invoked  after  the  second  20-hour  truncation.  For  the  system  that  performed  half  as 
well  as  the  nominal  system  (System  B),  using  the  observation  of  the  second  20-hour 
truncation  as  the  shrinkage  rule  led  to  83  percent  of  the  10-day  OTs  being  shrunk.  This 
sort  of  “trial  and  error”  approach  was  pursued  during  this  preliminary  study  (for  a  10-day 
OT)  and  led  to  the  above-described  choice  of  stopping  and  shrinkage  rules  for  further 
examination. 

As  discussed  in  the  Introductory  section  of  this  paper,  achieving  realism  during 
operational  tests  is  considered  a  high  priority.  To  this  end,  we  have  previously 
recommended  that  measures  be  taken  to  ensure  that  representative  levels  of  uncertainty 
be  maintained  during  ASW  search  OTs.  Further,  we  have  suggested  a  goal  of  keeping 
the  times  between  detection  long  enough,  on  average,  such  that  the  probability  of  a  search 
lasting  less  than  the  6  hours  of  a  typical  sonar  watch  is  about  one-half  or  less  (Ref.  II-3, 
page  IV- 10).  This  suggestion  was  meant  to  support  tenable  levels  of  crew  uncertainty 
(vigilance,  see  Ref.  11-17).  For  the  gamma  distribution  that  we  used  to  represent  the 
nominal  system  performance  (a',(3'  =  1.66, 4.54;  i.e.,  mean  time  to  detect/classify  =  7.54), 
the  corresponding  probability  of  a  detection/classification  occurring  within  6  hours  was 
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0.49.  The  conclusion,  with  respect  to  realism  as  defined  above,  is  that  a  system  with  a 
mean  time  to  detect/classify  of  7.54  hours  (assuming  gamma-distributed  times  to 
detect/classify  as  described  and  an  800  NM2  box)  is  appropriate. 

In  order  to  guarantee  that  the  shrinkage  rules  described  above  would  not  lead  to 
an  area  so  small  that  the  above  vigilance  rule  would  be  violated,  we  applied  one 
“expansion”  rule  during  this  simulation.  A  five-point  and  six-point  running  average  of 
times  to  detect/classify  was  continuously  monitored  during  the  simulation.  If  the 
six-point  average  went  below  1.5  hours  or  the  five-point  average  went  below  1.0  hour, 
then  the  box  for  the  next  and  successive  searches  of  that  test  trial  (assuming  no  further 
triggering  of  shrinkage  or  expansion  rules)  would  be  doubled  in  size. 

The  expansion  rules  were  arrived  at  via  a  trial  and  error  procedure,  similar  to  that 
described  for  identifying  the  stopping  rules.  Again,  this  was  accomplished  during  a 
preliminary  study  of  a  simulated  10-day  OT  (Ref.  11-17).  Two  system  performance  levels 
were  examined:  half  as  good  as  nominal  (System  B)  and  twice  as  good  as  nominal 
(System  C).  The  chosen  expansion  rules  led  to  expansions  of  the  area  for  0  (0/107)  and 

95  (61/65)  percent  of  the  simulated  10-day  OTs  (with  a  20-hour  stopping  rule)  for 
Systems  B  and  C,  respectively. 23  In  general,  these  expansion  rules  did  not  appear  to  be 
very  important  and  are  only  briefly  discussed  in  this  paper. 

The  test  control  rules  that  we  examined  are  not  the  only  ones  that  one  might 
choose,  nor  do  they  necessarily  represent  an  optimum  set.  Rather,  they  were  arrived  at  by 
a  process  of  trial  and  error  as  described  above.  The  simulation  study  reported  in  this 
document  further  explores  the  properties  of  this  chosen  set  of  test  control  rules. 

Two  thousand  four  hundred  random  draws  from  the  above-described  gamma 
distribution  were  prepared.  This  same  set  of  random  draws  was  used  for  each  set  of  test 
control  rules  that  were  examined  in  order  to  remove  any  variance  resulting  from  the  finite 
size  of  the  draw.  This  number  of  draws  turned  out  to  be  enough  to  allow  for  at  least  100 
simulated  OTs  under  all  of  the  various  test  control  rules  that  were  examined.  A  total  of 

96  different  test  situations  were  examined.  In  each  case,  100  trials  (or  simulated  OTs) 
were  conducted. 


23  Although  not  reported  on  here,  several  other  expansion  rules  were  investigated  including  the  use  of 
running  medians,  different  threshold  times,  and  different  expansion  amounts  (50  percent  versus  100 

percent). 
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Table  II-7  delineates  the  various  test  situations  that  were  investigated.  In  all 
cases,  the  time  for  repositioning  was  set  at  2  hours  and  the  initial  area  size  to  be  searched 
was  800  NM2.  These  48  conditions  (2  x  3  x  4  x  2)  were  examined  for  our  nominal  1994 
system  (Tables  II- 1  and  II-2,  average  time  to  detect/classify  =  7.54  hours)  and  for  a 
system  that  “performed”  half  as  well,  with  an  average  time  to  detect/classify  =  15.08 
hours.24  Thus,  a  total  of  96  (2  x  48)  test  situations  were  examined. 


Table  11-7.  Test  Situations  That  Were  Simulated 


Test  Length 

Localization  and 

Stopping  Rule  (Truncation 

Shrinkage  Rule  (%) 

(Days) 

Attack  Time  (Hr) 

Time  -  Hr) 

4  and  8 

2,  4,  and  8 

12,16,  20,24 

25  and  50 

Figure  II-9  presents  a  schematic  that  depicts  the  inputs,  outputs,  operations,  and 
feedback  loops  (shown  in  red  text)  associated  with  this  simulation  (as  described  above). 

Table  11-8  shows  the  first  8-day  trial,  for  a  system  with  a  7.54-hour  average  time 
to  detect/classify,  with  an  assumed  average  localization  and  attack  time  of  4  hours,  and 
using  a  stopping  rule  of  16  hours  and  a  shrinkage  rule  of  25  percent.  For  this  particular 
trial,  15  events  were  simulated  within  8  days.  Note  that  6  hours  (4  for  localization  and 
attack  and  2  for  repositioning)  were  added  to  each  time  to  detect/classify  that  was  less 
than  16  hours  to  generate  the  time  of  the  encounter  and  reposition.  Thirteen  of  these  15 
events  resulted  in  an  encounter.  The  two  events  in  which  the  random  draw  exceeded  16, 
the  red  font  in  Table  II-8,  were  truncated  at  16  hours  and  two  hours  for  repositioning  was 
added.  After  the  second  truncation,  event  7,  the  area  size  was  shrunk  by  25  percent  to 
600  NM2  and  the  random  draws  were  shrunk  by  an  equal  amount  to  generate  the  new 
times  to  detect/classify  (blue  font  in  Table  11-8). 


24  This  was  accomplished  by  simply  doubling  the  2,400  random  draw  values  described  earlier. 
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Output:  Number  of  Encounters,  Truncations 
Times  to  Detect  (Search-MOE  Estimates) 


Figure  11-9.  EXCEL™  Spreadsheet  Macro-Based  OT  Simulation  Schematic 
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Table  11-8.  Example  of  One  8-Day  Trial 


Event 

Number 

Random 

Draw 

Time  to  Detect/ 

Classify  (Hr) 

Time  of  Encounter 

and  Reposition 

(Hr) 

Cumulative 

Number  of 

Days 

Area  Size 

(NM2) 

1 

3.57 

3.57 

9.57 

0.40 

800 

2 

4.02 

4.02 

10.02 

0.82 

800 

3 

29.51 

16.00 

18.00 

1.57 

800 

4 

4.68 

4.68 

10.68 

2.01 

800 

5 

2.95 

2.95 

8.95 

2.38 

800 

6 

7.37 

7.37 

13.37 

2.94 

800 

7 

17.07 

16.00 

18.00 

3.69 

800 

8 

14.92 

11.19 

17.19 

4.41 

600 

9 

7.60 

5.70 

11.70 

4.90 

600 

10 

4.26 

3.20 

9.20 

5.28 

600 

11 

4.20 

3.15 

9.15 

5.66 

600 

12 

15.78 

11.84 

17.84 

6.40 

600 

13 

3.89 

2.92 

8.92 

6.77 

600 

14 

11.20 

8.40 

14.40 

7.37 

600 

15 

10.35 

7.76 

13.76 

7.95 

600 

NCa 

8.47 

6.35 

12.35 

>8 

600 

a  NC  =  I\ 

lot  completed. 

The  results,  or  output  of  these  simulations,  include,  for  each  of  the  100  trials  of  a 
given  test  situation,  estimates  of  the  average  time  to  detect/classify,  the  final  area  size 
being  searched  (i.e.,  at  the  end  of  the  simulated  OT),  SR,  MdSR,  and  MSR.  These  results 
-  in  particular  tables,  that  list  the  frequency  of  occurrence  of  various  values  for  each 
measure  (i.e.,  data  appropriate  for  histograms)  -  are  presented  in  Appendix  B.  The 
average  “observed”  times,  that  is,  the  values  that  include  truncated  observations  resulting 
from  test  stopping  rules,  are  also  reported  in  Appendix  B.  Similarly,  the  observed  values 
of  SR,  MdSR,  and  MSR  are  reported.  Chapter  III  provides  an  analysis  of  the  results  of 
these  simulations. 
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CHAPTER  III 


RESULTS  AND  ANALYSES 


III.  RESULTS  AND  ANALYSES 


This  chapter  provides  an  analysis  of  the  results  of  the  96  sonar  search  OTs  that 
were  simulated.  This  analysis  focuses  on  an  examination  of  the  impact  of  various  test 
control  rules.  In  particular,  the  impact  of  these  truncation  and  shrinkage  rules  on  the 
number  of  encounters  obtained  (per  simulated  OT)  and  the  value  and  variance  of  the 
search  measures  of  effectiveness  (SR,  MdSR,  and  MSR)  estimated  from  these  simulated 
OTs  is  investigated. 

We  begin  by  describing  the  results  of  a  nominal  run.  Recall,  a  “run”  is  defined  as 
100  trials  of  a  given  test  situation.  A  test  “situation”  is  defined  by  the  various  test 
conduct  initial  conditions  (e.g.,  number  of  days,  area  size  at  start,  time  spent 
repositioning)  and  test  control  rules  (e.g.,  stopping  time,  shrinkage  percentage).1  A 
“trial”  corresponds  to  one  simulated  OT  of  a  given  length.  A  given  trial  may  have,  for 
example,  15  events  in  8  days,  where  12  of  the  events  ended  with  an  encounter  and  3 
ended  with  an  OTD-directed  truncation. 

A.  A  NOMINAL  RUN:  8-DAY  OT  WITH  20-HOUR  STOPPING  RULE  AND  25 
PERCENT  SHRINKAGE 

This  section  reports  the  results  for  a  nominal  run.  The  goal  of  this  section  is  to 
describe  the  type  of  information  available  from  a  given  run.  Table  III- 1  presents  the 
initial  conditions  and  test  control  rules  that  were  associated  with  this  nominal  run.  Note 
that  for  all  runs  described  in  this  report  the  initial  area  size  was  800  NM2  and  the  time  for 
repositioning  after  an  encounter  or  truncation  was  two  hours.2 

We  define  the  following  shorthand  notation  to  identify  this  particular  run: 
{ AI8I4I20I25 }.  The  “A”  corresponds  to  a  sensor  with  an  average  time  to  detect/classify  of 
7.54  hours,  the  “8”  refers  to  the  number  of  test  days  simulated,  the  “4”  identifies  the 
assumed  average  localization  and  attack  time,  the  “20”  represents  the  stopping  rule,  and 
the  “25”  is  the  shrinkage  percentage.  This  notation  will  be  used  occasionally  in  this 
paper  to  identify  particular  runs. 


1 

2 


See  Appendix  B,  Table  B-l. 

See  Chapter  II  for  a  description  of  the  expansion  rules. 
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Table  MI-1.  Test  Situation  Parameters  for  Nominal  Run 


Test  Condition  /  Test  Control  Rule 

Value 

System  Performance:  Average  Time  to  Detect/Classify 

7.54  hours 

Test  Days 

8 

Localization  and  Attack  Time 

4  hours 

Stopping  Time 

20  hours 

Shrinkage  Percentage 

25% 

Figure  III-l  presents  a  histogram  of  the  number  of  events  that  were  taken  to 
completion  (i.e.,  resulted  in  an  encounter)  and  the  total  number  of  events  (i.e.,  those  that 
ended  with  an  encounter  or  a  truncation).  Figure  III-2  presents  a  histogram  that  describes 
the  final  area  sizes  being  searched.  Figures  II1-3  through  131-6  provide  histograms  of  the 
various  search  measures  (i.e.,  average  time  to  detect/classify,  SR,  MdSR,  and  MSR, 
respectively). 


Median  Values 


11  12  13  14  15  16  17  18 


Number  of  Events  Per  Trial 

Figure  ill-1.  Events  Per  Simulated  OT  for  Nominal  Run 

For  this  particular  8-day  run,  the  median  number  of  events  was  14  with  a  median 
number  of  encounters  (events  taken  to  completion)  of  13.  The  total  number  of  events 
ranges  from  1 1  to  1 8  as  do  the  total  number  of  encounters. 
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Of  100  trials  started  with  an  area  size  of  800  NM2,  eight  were  shrunk  by  25 
percent  to  600  NM2  by  the  end  of  the  simulated  OT  and  one  was  shrunk  twice  by  25 

percent  to  450  NM2.  See  Figure  ID-2. 
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Figure  111-2.  Area  Size  at  the  Completion  of  the  Simulated  OT 

The  median  value  of  the  average  time  to  detect/classify  for  a  trial  was  about  8 
hours  (Figure  III-3).  Note  that  the  impact  of  OTD-forced  truncations  (censoring)  is  to 
shorten  the  “right  tail”  of  the  distribution.3  For  this  run,  only  one  out  of  100  simulated 
OTs  had  an  average  search  duration  (time  to  detect/classify)  less  than  the 
6-hours  of  our  typical  sonar  watch. 

Frequency 
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Average  Time  (Hr) 

Figure  ill-3.  Histogram  of  Average  Time  to  Detect/Classify  Normalized  to  800  NM2 


3  The  cross-hatched  bars  in  Figure  III-3  labeled  “random  draw  with  censoring”  correspond  to  he 
frequency  of  average  times  to  detect/classify,  for  a  given  simulated  OT,  in  which  for  the  events  that 
ended  with  a  truncation,  the  truncation  time  (stopping  rule)  was  used  as  if  it  were  a  detection/ 
classification  time. 
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Figure  III-4  presents  a  histogram  of  search  rate  (SR),  where  SR  is  defined  as  in 
Chapter  II.  SR  values  computed  from  the  actual  (normalized  to  800  NM2)  random  draws 
and  from  the  values  obtained  after  the  OTD-forced  truncations  are  presented.  The  impact 
of  the  truncations  can  be  seen  in  Figure  HI-4  as  somewhat  less  mass  in  the  distribution  of 
SR  for  smaller  values  (less  than  90  NM2/Hr).  In  both  the  “random  draw’  and  ‘random 
draw  with  censoring”  case,  the  median  value  of  the  SRs  falls  in  the  110  NM  /Hr  bin. 
This  value  is  in  good  agreement  with  area  size  divided  by  the  expected  time  to 
detect/classify  (a  x  p)  7.54  hours  (800/7.54  =  106  NM2/Hr). 


Frequency 


70  80  90  100  110  120  130  140  150  160  170  180 


Search  Rate  (NM2/Hr) 

Figure  111-4.  Histogram  of  Search  Rate  (SR)  Estimates  for  Nominal  Run 


Figure  III-5  presents  the  histogram  for  the  search  measure  median  search  rate 
(MdSR).  Estimates  of  MdSR  were  never  impacted  by  the  OTD-forced  truncations,  since 
in  no  case  (i.e.,  for  none  of  the  9,600  trials)  did  the  number  of  truncations  exceed  one-half 
of  the  total  number  of  events.  The  median  value  of  MdSR  is  about  130  NM2/Hr. 
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Figure  111-5.  Histogram  of  Median  Search  Rate  Estimates  for  Nominal  Run 


Finally,  Figure  III-6  shows  the  histograms  for  mean  search  rate  (MSR).  As  was 
the  case  with  SR,  the  truncations  shift  the  distribution  slightly  to  the  right  (higher  values). 
The  median  value  of  MSR,  for  both  cases  shown  in  Figure  III-6,  is  about  240  NM2/Hr. 
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Figure  ill-6.  Histogram  of  Mean  Search  Rate  Estimates  for  Nominal  Run 
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B.  IMPACT  OF  TEST  CONDITIONS  AND  TEST  CONTROL  RULES  ON  THE 
NUMBER  OF  EVENTS  PER  SIMULATED  OT 

This  section  examines  the  number  of  events  that  resulted  from  the  various  test 
conditions  and  test  control  rules. 


1.  Impact  of  Test  Conditions 


Figure  III-7  shows  the  impact  of  test  duration  on  the  number  of  test  events.  The 
test  condition  and  control  rules  described  in  the  last  section,  { AI8I4I20I25},  are  compared 
to  the  same  conditions  and  rules  but  for  only  a  4-day  test  duration,  that  is,  { AI4I4120I25 } , 
in  Figure  III-7.  Figure  III-7  plots  the  median  value  (based  on  100  trials)  for  the  number 
of  events  taken  to  completion  and  the  median  value  for  the  total  number  of  events.  The 
“error”  bars  in  Figure  III-7  correspond  to  the  80  percent  interval  of  the  computed 
distribution.4  For  this  case  and,  in  general,  the  relationship  between  test  duration  and  the 
number  of  events  is  roughly  linear.  That  is,  doubling  the  test  duration  from  4  to  8  days 
leads  to  a  doubling  in  the  expected  number  of  total  events  from  7  to  14.  (Exceptions  to 
this  linearity  are  discussed  later  in  this  section.) 
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Figure  111-7.  Impact  of  Test  Duration  on  Number  of  Events 


4  Thai  is,  the  lower  bar  corresponds  to  the  10th  percentile  value  for  the  100  trials  and  the  upper  bar 
corresponds  to  the  90th  percentile  value  of  the  100  trials. 
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Figure  III-8  portrays  the  impact  of  increasing  the  average  time  for  localization  and 
attack  from  2  to  8  hours.5  Figure  III-8  has  a  format  similar  to  Figure  IH-7  (i.e.,  median 
values  and  80  percent  intervals  are  shown).  As  expected,  the  number  of  test  events 
decreases  as  the  average  number  of  hours  assumed  for  localization  and  attack  increases. 
This  relationship  appears  to  be  roughly  linear  with  a  somewhat  more  negative  slope  for 
the  longer  test  duration.  (The  dashed  lines  of  Figure  III-8  correspond  to  “least-squares” 
lineai1  fits.) 
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Figure  111-8.  impact  of  Average  Time  for  Localization  and  Attack  on  Number  of  Events 


5  Figure  III-8  examines  the  impact  of  the  systematic  variation  of  the  assumed  localization  and  attack 
time.  Figure  III-8  compares  the  results  of  the  following  six  runs:  {AI8I2I20I25},  { A18I4I20I25 } , 
{AI8I8I20I25},  {AI4I2I20I25},  {AI4I4I20I25},  and  { AI4I8I20I25 } . 
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We  now  consider  the  impact  of  testing  a  system  that  is  nominally  half  as  good  - 
perhaps  the  environment  or  target  is  twice  as  difficult  -  as  the  one  we  have  previously 
assumed.  That  is,  the  times  to  detect/classify  for  this  poorer  system  (“System  B”)  are,  on 
average,  twice  as  long  as  the  times  associated  with  the  nominal  system  (“System  A”). 
Recall  (Chapter  II,  page  11-19)  that  System  A  was  based  on  past  observations  and  for  the 
test  conditions  considered  had  an  average  time  to  detect/classify  of  7.54  hours.  Thus, 
System  B,  for  the  same  set  of  test  conditions,  is  simulated  by  using  an  average  time  to 
detect/classify  of  15.08  hours. 

Figure  III-9  presents  the  results  of  our  simulations  for  System  A,  in  blue,  and 
System  B,  in  red.6  Figure  III-9  employs  a  format  similar  to  the  two  previous  figures.7  A 
few  conclusions  are  evident  from  Figure  III-9.  First,  the  poorer  system,  being  tested  in  an 
800  NM2  box,  leads  to  fewer  events.  However,  halving  the  system  performance  (i.e., 
doubling  the  times  to  detect/classify)  did  not  halve  the  number  of  events.  For  example, 
for  an  assumption  of  2  hours  (on  average)  for  localization  and  attack  and  8  days  of 
testing,  System  A  had  median  values  of  16  for  events  taken  to  completion  and  total 
events,  whereas  System  B  had  median  values  of  10  and  13,  respectively.  The  same  trend 
is  observed  for  the  other  assumptions  of  localization  and  attack  time  and  for  the  4-day 
test  duration  as  well.  In  part,  this  relative  test  efficiency  (i.e.,  the  event  sample  size  does 
not  decrease  as  fast  as  the  system  performance)  is  due  to  the  20-hour  stopping  rule,  which 
truncated  some  of  the  longest  events  that  would  have  “wasted  OT  time.” 

A  second  feature  of  Figure  III-9  is  related  to  the  magnitude  of  the  differences 
between  the  events  taken  to  completion  and  the  total  number  of  events  (compare  circles 
to  triangles).  For  System  A,  the  median  value  for  events  taken  to  completion  (i.e.,  ending 
in  an  encounter)  is  generally  equal  to  the  median  value  for  the  total  number  of  events. 
The  exception  is  the  8-day,  4-hour  localization/attack  time  case,  for  which  the  median 
value  for  the  total  number  of  events  is  14  and  the  median  value  for  events  taken  to 
completion  is  13.  As  expected  for  System  B,  the  20-hour  stopping  rule  is  occasionally 
“triggered”  and  results  in  differences  between  the  median  values  of  events  taken  to 
completion  and  total  events.  For  example,  for  the  8-day  test  and  the  2-hour  assumed 
localization/attack  time  case,  the  median  value  for  the  total  number  of  events  is  13  and 
the  median  value  for  events  taken  to  completion  is  10. 


6  The  results  related  to  4-day  test  simulations  are  shown  with  shaded  circles  or  triangles  and  the  8-day 
test  results  are  shown  with  open  circles  or  triangles. 

7  In  fact,  the  points  and  error  bars  shown  in  blue  in  Figure  III-9  (System  A),  correspond  exactly  to  those 
of  Figure  III-8. 
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Figure  111-9.  Impact  of  Assumed  System  Performance  on  Number  of  Events 


If  two  or  more  events  were  truncated  during  a  trial,  the  test  control  rules  caused 
the  area  size  to  be  shrunk,  in  this  case  by  25  percent  (from  800  NM2  to  600  NM2).  Figure 
III- 10  shows  the  distributions  of  area  sizes  at  the  end  of  the  simulated  tests  (all  tests 
started  at  800  NM2)  for  six  conditions.  As  expected,  the  trials  involving  System  B  led  to 
the  employment  of  shrinkage  rules  much  more  often  than  the  comparable  trials  involving 
System  A.  For  instance,  for  the  4-day  simulated  OT,  only  1,  3,  and  1  of  100  System  A 
trials  led  to  shrinkage  to  a  600  NM2  box,  for  the  2-,  4-,  and  8-hour  localization  and  attack 
assumptions,  respectively.  The  comparable  numbers  for  the  System  B  simulations  are 
31,  21,  and  25  of  100,  respectively,  with  a  few  of  those  (3,  3,  and  1)  shrinking  yet  again 
to  450  NM2.  For  the  8-day  set  of  trials,  the  effect  is  more  dramatic  (Figure  HI- 10).  In  the 
three  8-day,  System  B  cases  shown  in  Figure  III- 10,  the  median  value  of  the  final  area 
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size  searched  is  600  NM2.  That  is,  the  shrinkage  rule  described  here  (i.e.,  two  20-hour 
truncations  lead  to  a  25  percent  shrinkage)  is  usually  employed  in  the  case  where  the 
system  performance  is  one-half  of  what  was  expected  (System  B). 
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Figure  111-10.  Employment  of  Shrinkage  Rule:  Distribution  of  Areas  Sizes 

at  the  End  of  Simulated  OT 


m-io 


2.  Impact  of  Test  Control  Rules 


This  section  focuses  on  the  impact  of  the  chosen  stopping/shrinkage  rules  (12,  16, 
20,  or  24  hours  and  25  or  50  percent  shrinkage)  on  the  number  of  events  expected. 
Figures  III-l  1  and  III- 12  show  the  effect  of  differing  stopping  rules  on  the  median  value 
of  the  number  of  events  taken  to  completion  and  the  median  value  of  the  total  number  of 
events,  respectively.  (Both  of  these  figures  result  from  the  usage  of  a  25-percent 
shrinkage  rule.) 

The  stopping  rules  can  increase  the  number  of  events  per  trial  (simulated  OT)  in 
two  ways.  First,  events  that  might  be  very  long  (e.g.,  >  one  day)  are  truncated,  saving 
some  of  the  post-truncation  search  time  for  the  next  event.  Second,  if  the  stopping  rule  is 
invoked  twice  during  one  trial,  the  area  size  is  shrunk  by  some  specified  amount  (25 
percent  in  this  case),  and  the  follow-on  times  to  detect/classify  are  therefore  shorter,  on 
average.  Taken  over  enough  test  days,  these  stopping  rules  will  increase  the  number  of 
events. 

With  respect  to  the  median  value  of  the  number  of  events  taken  to  completion 
(Figure  III- 11),  it  is  seen  that  for  4-day  tests,  differences  in  stopping  rules  have  no 
impact.8  For  the  nominal  system  (System  A),  there  is  only  marginal  impact  at  the  8-day 
mark.  However,  for  the  system  that  performs  poorer  than  expected  (System  B),  at  8  days, 
the  shorter  test  control  stopping  rules,  12  and  16  hours,  result  in  a  larger  median  value. 
This  trend  is  magnified  when  the  median  value  of  the  total  number  of  events  per  trial  is 
considered  (Figure  III- 12).  For  example,  in  the  case  of  an  8-day  test  of  System  B  (with  a 
4-hour  localization/attack  time),  the  12-hour  rule  results  in  a  median  value  of  total 
number  of  events  per  trial  of  15,  whereas  the  24-hour  rule  leads  to  a  median  value  of  10 
(50  percent  larger).  Even  for  the  shorter  4-day  test,  the  results  for  System  B  vary  as  a 
function  of  stopping  rule  from  6  to  4,  7  to  5,  and  8  to  5,  for  the  three  assumed  average 
times  for  localization  and  attack  -  2,  4,  and  8,  respectively. 

We  conclude  that  shorter  stopping  rule  times  can  increase  the  number  of  events 
(and  events  to  completion)  for  longer  test  periods  and/or  when  the  system  (including  the 
target  and  environment)  being  tested  is  a  significantly  poorer  detector/classifier  than 


8  Recall  that  the  values  reported  in  Figure  III-l  1  correspond  to  the  median  values  of  the  number  of 
events  taken  to  completion  (from  distributions  reported  in  Appendix  B).  An  examination  of  the  actual 
distributions  shows  the  small  expected  changes  in  the  number  of  events  taken  to  completion  for  the  4 
day  test  case  (given  the  usage  of  these  test  control  rules). 
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Stopping  Rule  (Hr) 

Figure  111-11.  Test  Control  Stopping  Rules:  Median  Value  of  the  Number 
of  Events  Taken  to  Completion 
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Figure  111-12.  Test  Control  Stopping  Rules:  Median  Value  of  the  Total  Number  of  Events 
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initially  expected  (i.e.,  poorer  than  the  system  assumed  when  “sizing”  the  original  search 
box). 

Figure  III- 13  compares  two  test  control  shrinkage  rules,  25  and  50  percent.  The 
solid  lines  of  this  figure  correspond  to  the  median  value  for  the  number  of  events  taken  to 
completion  (as  always,  for  100  trials)  with  the  25  percent  shrinkage  rule  employed.  That 
is,  the  solid  colored  lines  of  Figure  III- 13  are  identical  to  those  of  Figure  III-ll.  The 
“diamonds”  and  “stars”  of  Figure  HI- 13  correspond  to  the  values  associated  with  the  use 
of  a  50  percent  shrinkage  rule.  Figure  III- 14  is  of  the  same  format  but  presents  the 
median  values  for  the  total  number  of  events.  That  is,  the  solid  lines  of  Figure  III- 14 
correspond  to  the  data  of  Figure  III- 12.  The  diamonds  and  stars  of  Figure  III- 14 
correspond  to  the  median  values  that  result  from  a  50  percent  shrinkage  rule. 

Comparing  the  50  percent  and  25  percent  shrinkage  rules  results  (Figures  III- 13 
and  III- 14),  we  first  note  that  the  median  values  associated  with  the  50  percent  rule  trials 
are  always  greater  than  or  equal  to  those  of  the  25  percent  rule.  This  is  because,  when 
invoked,  the  50  shrinkage  percent  rule  leads  to  smaller  areas  to  be  searched  and,  hence, 
shorter  times,  on  average,  to  detection/classification.  Over  a  long  enough  period  of  test 
time,  this  results  in  more  events  taken  to  completion  and  more  total  events.  There  is  little 
difference  between  the  25  and  50  percent  shrinkage  rule  results  for  the  4-day  test. 
Similarly,  for  8-day  tests  of  System  A,  the  system  that  performs  as  expected,  the  50 
percent  rule  adds  only  1  or  2  to  the  median  value  when  the  12-hour  stopping  rule  is 
employed.  However,  for  the  8-day  test  of  System  B,  the  50  percent  shrinkage  rule 
resulted  in  significantly  more  events  taken  to  completion  and  total  events.  For  example, 
for  the  16-hour  rule  (System  B,  2-hour  average  localization/attack  time),  the  50  percent 
rule  resulted  in  a  median  value  of  15  and  the  25  percent  rule  resulted  in  a  median  value  of 
11.  That  is,  for  this  case,  a  36  percent  increase  in  the  median  value  of  events  taken  to 
completion  was  caused  by  changing  the  shrinkage  rule  from  25  to  50  percent. 

The  impact  of  a  50  percent  shrinkage  rule,  relative  to  a  25  percent  rule,  on  the 
median  value  of  total  events  can  be  seen  in  Figure  HI- 14.  The  results  are  similar  to  those 
discussed  above  for  the  number  of  events  taken  to  completion,  albeit  with  a  somewhat 
smaller  impact.  For  instance,  in  the  case  of  the  8-day  test  of  System  B  with  a  16-hour 
stopping  rule  and  a  2-hour  average  time  for  localization  and  attack,  the  50  percent 
shrinkage  rule  results  in  a  median  value  of  total  events  of  17  and  the  25  percent  rule 
results  in  a  median  value  of  15. 
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Figure  111-13.  Test  Control  Shrinkage  Rules:  Median  Value  of  the  Number 
of  Events  Taken  to  Completion 
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Figure  111-14.  Test  Control  Shrinkage  Rules:  Median  Value  of  the  Total  Number  of  Events 
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Figures  HI- 11  through  111-14  also  allow  for  a  comparison  of  the  median  values  of 
the  number  of  events  taken  to  completion  or  the  median  value  of  the  number  of  total 
events  as  a  function  of  the  System  being  tested,  A  or  B.  As  discussed  earlier  (Figure 
III-9),  the  test  control  rules  (stopping  time  and/or  shrinkage  percent)  tend  to 
mitigate,  by  design,  the  effect  of  system  performance  on  the  number  of  events  realized. 
For  longer  tests,  for  example,  8  days,  even  a  system  that  performs  half  as  well  as 
expected  can  be  tested  with  these  rules  such  that  about  the  same  number  of  events  is 
realized  as  would  have  been  if  the  system  had  performed  as  expected.  For  instance,  for 
the  16-hour  stopping  rule  and  50  percent  shrinkage  rule  (with  a  4-hour  average  time  for 
localization  and  attack),  the  median  values  for  events  taken  to  completion  and  total  events 
are  14  and  15,  respectively,  for  System  A.  For  System  B,  the  median  values  are  12  and 
14,  respectively  (within  15  percent  of  the  System  A  values). 

We  also  investigated  the  potential  for  time  savings  resulting  from  the  use  of  test 
control  rules  in  a  free-play  OT.  For  this  examination,  we  simulated  an  event-terminated, 
rather  than  time-terminated,  free-play  OT.  We  considered  a  15-encounter  (i.e.,  events 
taken  to  completion)  OT  in  which  System  B,  the  poorer  performer,  was  tested  without 
test  control  rules.  We  also  simulated  the  same  situation  with  the  16-hour  stopping  rule 
and  the  50  percent  shrinkage  rule.  In  both  cases,  100  trials  were  run  (using  the  same 
initial  set  of  random  draws)  and  4  hours  were  assumed  for  the  average  time  to  localize, 
attack,  and  reposition.  Figure  III- 15  shows  the  cumulative  probability  of  completing  such 
a  15-encounter  OT  as  a  function  of  test  days.  Without  the  test  control  rules  (curve  shown 
in  black),  it  takes  between  8  and  18  days,  with  a  median  value  of  about  12  days,  to 
complete  this  free-play  OT.  By  using  the  16-hour/50  percent  shrinkage  rule  (curve 
shown  in  red),  this  15-encounter  OT  of  System  B  takes  between  7  and  12  days,  with  a 
median  value  of  about  8  days.  That  is,  using  the  median  values  for  comparison,  these 
specific  test  rules  allow  this  OT  of  System  B  to  be  completed  in  33  percent  less  time. 
Figure  III-15  also  shows  the  cumulative  probability  of  completing  a  15-event  (encounters 
plus  truncations)  OT  with  the  test  control  rules  (curve  shown  in  blue).  This  OT  is 
completed  in  between  6  and  9  days,  with  a  median  value  of  about  7  days. 

Of  course,  triggering  the  test  control  rules  too  often,  that  is,  truncating  too  many 
events  or  shrinking  the  area  size  by  too  much,  can  lead  to  unwanted  effects.  If  the  area  is 
shrunk  too  much,  the  times  to  detect/classify  may  become  unrealistically  short.  Recall 
(Chapter  II)  that  our  test  control  rules  (and,  hence,  our  simulation)  contained  an 
“expansion”  rule.  If  ever  the  running  six-point  average  of  the  times  to  detect/classify 
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Figure  111-15.  Time  Savings  Due  to  the  Use  of  Test  Control  Rules: 

Number  of  Days  Required  for  a  15-Event  OT  of  System  B 

went  below  1.5  hours  or  the  running  five-point  average  went  below  1.0  hour,  the 
expansion  rule  called  for  doubling  the  size  of  the  box. 

For  the  96  runs  (test  situations)  that  we  examined,  only  12  led  to  the  triggering  of 
this  expansion  rule.  Triggering  the  expansion  rule  is  an  indication  that,  for  at  least  a 
portion  of  the  test,  the  times  to  detect/classify  were  unrealistically  short.  Eleven  of  the  12 
runs  in  which  the  expansion  rule  was  invoked  involved  the  50  percent  shrinkage  rule  and 
9  of  the  12  runs  involved  the  12-hour  stopping  rule.  The  combination  of  the  12-hour 
stopping  rule  and  50  percent  shrinkage  rule  accounted  for  seven  of  the  12  runs  that 
invoked  the  expansion  rule. 

Table  III-2  lists  the  runs  in  which  the  expansion  rule  was  invoked.  Table  III-2 
also  lists  the  number  of  trials  (out  of  100)  in  which  the  expansion  rule  was  triggered  for 
each  run  in  which  it  was  triggered  and  the  (approximate)  average  time  to  detect/classify 
that  was  realized  for  the  median  trial  of  that  run.  Recall,  the  average  time  to 
detect/classify  for  System  A  in  an  800  NM2  box  should  be  7.54  hours.  In  situations  in 
which  the  rules  were  rarely  invoked,  for  example,  when  using  the  24-hour  stopping  rule, 
the  average  time  to  detect/classify  for  the  median  trial  (no  truncations)  for  System  A  (8- 
day  test ,  8-hour  average  localization  and  attack  time  and  50  percent  shrinkage  rule)  was 
7.45  hours  (close  to  the  nominal  value  of  7.54  hours).  The  comparable  run  (median  trial 
with  one  truncation)  for  System  B  was  10.75  hours.  Recall,  System  B  starts  the  test  in  an 
800  NM2  area  with  an  average  time  to  detect/classify  of  15.08  hours.  Of  course,  once  the 
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Table  HI-2.  Test  Situations  in  Which  the  Expansion  Rule  Was  Invoked 


System 

Test 

Days 

L&A 

Time 

(Hr)* 

Stopping 
Rule  (Hr)  / 
Shrinkage 
Rule  (%) 

#  of  Trials 
With 

Expansion 

Average  Time  to 
Detect/Classify 
for  Median  Trial 
(Hr) 

Approximate 
Probability  of 
Detect/Classify 
Within  6  Hours 

A 

4 

2 

12/50 

1 

6.25 

0.58 

A 

8 

8 

12/50 

3 

5.27 

0.67 

A 

8 

4 

12/50 

2 

5.13 

0.68 

A 

8 

2 

12/25 

1 

4.82 

0.71 

A 

8 

2 

12/50 

10 

4.63 

0.73 

A 

8 

2 

16/50 

2 

6.24 

0.58 

B 

8 

8 

12/50 

1 

5.00 

0.69 

B 

8 

8 

16/50 

1 

7.33 

0.51 

B 

8 

4 

12/50 

4 

4.71 

0.72 

B 

8 

2 

12/50 

9 

4.33 

0.76 

B 

8 

2 

16/50 

1 

6.40 

0.57 

B 

8 

2 

20/50 

1 

8.33 

0.45 

a i  /  _  i _ .  i. iu  „ 

a  L&A  =  Localization  and  attack  .  The  approximate  average  times  to  detect/classify  for  the 
median  trial  were  computed  as  follows.  First,  the  number  of  events  taken  to  completion  and 
truncations  for  the  median  trial  was  extracted  from  Appendix  B.  Next,  the  hours  due  to 


localization,  attack,  repositioning,  and  truncation  were  subtracted  from  96  hours  (the 
approximate  length  of  the  test).  The  remaining  “search"  hours  were  divided  by  the  number  of 


detections/classifications  for  the  median  trial. 


50  percent  shrinkage  rule  is  triggered,  these  times  to  detect/classify  will,  on  average,  get 
50  percent  shorter.  The  average  times  discussed  above,  7.54  and  7.45  hours  for  System 
A  and  10.75  hours  for  System  B,  result  from  situations  in  which  the  stopping  rules  were 
rarely  or  never  invoked,  and  represent  points  for  comparison  to  the  times  shown  in  Table 
III-2.  The  runs  shown  in  Table  III-2  correspond  to  situations  in  which  the  shrinkage  rule 
was  invoked  so  often  that  the  expansion  rule  was  eventually  triggered. 


The  approximate  average  times  to  detect/classify  shown  in  Table  III-2  for  the 
12-hour  stopping/50  percent  shrinkage  rule,  as  expected,  are  shorter  than  the  nominal 
(System  A  in  800  NM2  area)  time  of  7.54  hours.  The  fact  that  the  expansion  rule  was 
triggered,  that  is,  several  short  times  to  detect/classify  in  a  row  were  realized,  and  that  the 
average  time  to  detect/classify  was  found  to  be  significantly  less  than  the  nominal  value 
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suggests  that  at  least  some  portion  of  this  OT  would  have  been  conducted  with  unrealistic 
times  to  detection/classification. 

Assuming  a  gamma  distribution  with  a'  =  1 .66  (our  initial  distributional  model), 
and  rescaling  p'  to  the  average  times  reported  in  Table  III-2  (mean  time/a'  =  P')>  allows 
us  to  estimate,  based  on  this  new  fitted  gamma  distribution,  the  probability  of 
detection/classification  occurring  on  a  6-hour  sonar  watch  (that  starts  at  COMEX).  The 
last  column  of  Table  III-2  presents  these  probabilities.  The  8-day  test  situations  that 
involve  the  12  hour  stopping  rule  and  the  50  percent  shrinkage  rule  led  to  conditions  in 
which  we  estimate  the  expected  probability  of  detection  within  6  hours  for  the  median 
trial  to  be  greater  than  our  test  realism  rule-of-thumb  described  in  Chapter  II  (  “less  than 
half  a  chance  of  a  detection/classification  within  6  hours”).  For  these  situations,  the 
probabilities  of  detect/classify  within  6  hours  for  the  median  trial  vary  from  0.67  to  0.73 
and  between  0.69  and  0.76  for  Systems  A  and  B,  respectively.  For  comparison,  using  the 
24  hour  stopping  rule  and  25  percent  shrinkage  rule  over  an  8-day  test  led  to  comparably 
computed  probabilities  for  the  median  trial  of  between  0.47  and  0.50  and  between  0.29 
and  0.37  for  Systems  A  and  B,  respectively. 

We  conclude  that,  although  the  12-hour  stopping/50  percent  shrinkage  rule  offers 
the  most  events  for  a  test  of  a  given  duration,  it  comes,  sometimes,  with  a  steep  price  — 
test  realism.  Therefore,  for  systems  like  A  or  B,  we  would  want  to  avoid  the  combination 
of  short  stopping  rules  (e.g.,  12  hours)  and  large  shrinkage  rules  (e.g.,  50  percent).  On 
the  other  hand,  stopping  rules  of  16,  20,  and  24  hours  with  shrinkage  rules  of  25  percent 
or,  in  general,  50  percent  appear  satisfactory,  with  the  16-hour  stopping  rule  offering  real 
benefits  with  respect  to  sample  size  robustness  to  system  performance  in  an  8-day  test 
(compare  System  A  and  B  in  Figures  ID- 13  and  III- 14). 

A  second  potential  problem  with  the  employment  of  stopping  rules  is  the  effect  of 
truncated  events  on  estimates  of  search  MOEs.  This  issue  and  others  related  to  estimates 
of  search  MOEs  are  described  in  the  next  section. 

C.  IMPACT  OF  TEST  CONDITIONS  AND  TEST  CONTROL  RULES  ON 
ESTIMATES  OF  SEARCH  MOES 

Three  search  MOEs  are  examined  in  this  section  -  SR,  MdSR,  and  MSR.  Chapter 
II  (Equations  5a  though  5c)  provides  definitions  of  these  search-related  MOEs. 
Importantly,  for  simulated  trials  in  which  events  are  truncated,  the  censored  times,  for 
example,  20  hours  for  the  20-hour  stopping  rule,  can  be  used  directly  to  estimate  the 
various  search-related  MOEs. 
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Figure  III- 16  presents  the  median  values  and  the  80  percent  intervals  of  the  three 
search  MOEs  that  resulted  from  simulation  (100  trials)  with  the  24-hour  stopping  rule 
and  the  25  percent  shrinkage  rule.9  We  note  that  the  median  values,  represented  by  the 
solid  triangles,  do  not  vary  much  as  a  function  of  test  duration  or  assumed  localization 
and  attack  time. 


Localization  and  Attack  Time  (Hr) 


Figure  111-16.  Effect  of  Test  Conditions  on  Search  MOE  Variance 
(24-Hour  Stopping  Rule  and  25  Percent  Shrinkage  Rule) 


9  Figure  III-16  compares  the  results  of  the  following  twelve  runs:  {AI8I2I24I25},  {AI8I4I24I25}, 
{ AI8I8I24I25 } ,  { AI4I2I24125 } ,  {AI4I4I24I25},  { AI4I8I24I25 },  {BI8I2I24I25},  { BI8I4I24I25 } , 
{BI8I8I24I25},  {BI4I2I24I25},  {BI4I4I24I25},  and  {BI4I8I24I25}. 
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We  can  consider  the  length  of  the  80  percent  interval,  based  on  100  trials,  for  each 
of  our  test  conditions  and  test  control  rules  as  a  measure  of  variance  for  our  estimates  of 
search-related  MOEs.  We  normalize  this  80  percent  interval  length  by  dividing  it  by  our 
“point  estimate,”  really  the  median  value  of  the  appropriate  search-related  MOE,  to  create 
a  parameter  that  we  refer  to  as  the  interval  length  /  point  estimate  (IL/PE).  This  unitless 
IL/PE  value  allows  us  to  gauge  the  differences  in  variance  between  measures  and  across 
simulated  test  situations.  One  can  think  of  the  IL/PE  as  simply  the  fraction  of  the  MOE 
value  that  the  80  percent  interval  represents.  For  example,  if  the  length  of  the  80  percent 
interval  is  two  times  the  magnitude  of  the  median  value  of  the  given  MOE,  the 
corresponding  IL/PE  value  will  be  2.0. 

Figure  113-17  presents  the  IL/PE  values  for  a  variety  of  test  situations  (but  always 
with  the  25  percent  shrinkage  rule).  As  expected,  for  all  three  MOEs,  the  “variance” 
(IL/PE)  decreases  with  increasing  sample  size  -  resulting  from  either  a  longer  test 
duration  (8  days  versus  4)  or  less  time  spent  in  localization  and  attack  (2,  4,  or  8  hours). 
The  IL/PE  values  are  similar  for  MdSR  and  SR,  with  the  SR  values  being  somewhat 
smaller.  The  IL/PE  values  associated  with  MSR  are  about  twice  as  large  as  those 
associated  with  SR.  The  suggestion  is  that  the  variance  associated  with  estimates  of 
MSR  will  be  significantly  greater  than  those  associated  with  either  SR  or  MdSR. 

In  addition.  Figure  III- 17  compares  IL/PE  values  for  the  four  different  stopping 
rules  that  were  simulated.10  First,  for  System  A,  there  appears  to  be  little  consistent 
impact  associated  with  the  choice  of  test  control  stopping  rule.  The  exception  is  the 
4-day  OT  case  in  which  the  average  localization  and  attack  time  is  taken  as  8  hours,  that 
is,  the  smallest  sample  size  case  for  System  A.  In  this  exceptional  case,  the  24-hour 
stopping  rule  consistently  results  in  the  largest  IL/PE  for  all  three  MOEs. 

For  System  B,  the  system  that  performs  at  half  the  level  of  our  expectation. 
Figure  III- 17  suggests  a  greater  sensitivity  of  the  IL/PE  value  to  the  choice  of  stopping 
rule.  The  suggestion  is  that  the  shorter  stopping  rules,  12  and  16  hours,  in  general,  result 
in  less  variance. 


10  Recall,  Figure  III- 16  examined  only  one  stopping  rule  (24  hours). 
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Figure  111-17.  Effect  of  Stopping  Rule  on  Search  MOE  Variance: 
25  Percent  Shrinkage  Rule  Only 
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For  completeness,  Figure  III- 18  presents  all  288  (96  x  3)  IL/PE  values.  That  is, 
the  solid  lines  of  Figure  131-18,  correspond  exactly  to  the  48  cases  (for  each  measure)  that 
use  the  25  percent  shrinkage  rule  shown  in  Figure  HI- 17.  The  various  symbols  of  Figure 
III- 18  correspond  to  the  associated  runs  with  the  50  percent  shrinkage  rule.  The 
conclusions  of  the  last  three  paragraphs  remain  unchanged. 

To  this  point,  the  MOEs  have  been  computed  by  simply  including  the  censored 
data  (20  hours,  if  the  trial  was  stopped  at  that  time)  as  if  they  represented 
detection/classification  times.  As  discussed  earlier  (page  III-5),  estimates  of  MdSR  were 
never  affected  by  this  potential  bias  mechanism.  On  the  other  hand,  estimates  of  SR  and 
MSR  could  be  significantly  biased  given  the  direct  inclusion  of  this  censored  data  in  the 
calculation  of  these  MOEs.  As  shown  earlier  in  this  chapter  (Figures  III-4  and 
113-6),  this  biasing  is  expected  to  be  greatest  for  estimates  of  SR  (vice  MSR). 

For  SR,  we  can  mitigate  this  effect  by  estimating  the  mean  time  to  detect/classify 
using  the  MLE  technique  described  in  Chapter  II  (Equation  11-10).  This  technique  allows 
one  to  incorporate  the  information  available  from  the  truncated  trials.  Throughout  the 
next  section,  this  technique  is  used. 

With  respect  to  search-related  MOEs,  our  analyses  to  this  point  suggest  the 
following: 

•  Estimates  of  MSR  will  have  the  greatest  variance.  MdSR  and  SR  will  have 
similar  variance  associated  with  their  estimation. 

•  For  a  system  that  performs  worse  than  expected  (i.e..  System  B),  using 
shorter  stopping  rules  can  decrease  the  variance  of  all  three  MOEs. 

•  Stopping  rules  of  16  and  20  hours  appear  to  be  a  reasonable 
variance-reducing/realism-maintaining  compromise.  Avoiding  the  12-hour 
stopping  rule  and  using  a  25  percent  shrinkage  rule,  rather  than  a  50  percent 
rule,  can  reduce  the  likelihood  that  unrealistically  short  times  to 
detect/classify  are  generated  for  a  portion  of  the  test  period. 

•  MdSR  can  be  estimated  directly  from  the  observed  events.  SR  can  be 
estimated  using  an  MLE  technique  that  allows  for  the  incorporation  of 
censored  data. 

All  of  these  interim  conclusions  are  based  on  systems  that  perform  as  described  in 
Chapter  II  (i.e.,  Systems  A  and  B). 
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D.  A  COMPARISON  OF  SEARCH-RELATED  MOEs:  HYPOTHESIS  TESTING 
USING  PARAMETRIC  BOOTSTRAP  TECHNIQUES 

In  this  section,  we  compare  our  estimates  of  search-related  MOEs  obtained  from 
individual  simulated  OTs  (trials)  to  assumed  thresholds.  Importantly,  this  section  focuses 
the  examination  on  the  information  available  from  one  trial.  Of  course,  this  is  the  type  of 
information  that  would  be  available  after  a  real  test.  That  is,  at  the  conclusion  of  a  real 
OT,  only  one  set  of  events  exists,  not  100  as  has  been  the  case  in  our  analyses  to  this 
point.  The  thresholds  can  be  thought  of  as  Navy-defined  values  that  are  meant  to  aid  the 
evaluation  of  system  performance  based  on  OT  measurements.  Our  interest  in  this 
section  is  to  explore  the  impact  of  the  choice  of  test  conditions  and  test  control  rules  on 
these  comparisons  to  thresholds. 

Table  III-3  presents  estimates  of  the  nominal  values  for  each  of  the  three 
search-related  MOEs  for  both  System  A  and  B.  The  nominal  value  of  SR  was  obtained 
by  considering  the  parameters  of  the  parent  gamma  distribution  (i.e.,  SR  =  area  size  / 
mean  time  to  detect/classify  =  Area  size  /  (ax  (3)).  For  MdSR  and  MSR,  26,000  random 
numbers  were  drawn  from  the  appropriate  gamma  distribution  and  the  individual  search 
rates  (sr^s)  were  computed.  MdSR  and  MSR  were  then  computed,  as  described  in 
Chapter  n,  from  these  26,000  values. 

Table  111-3.  Nominal  Values  of  MdSR,  SR,  and  MSR  (NM2/Hr) 


MOE 

System  A 

System  B 

MdSR 

132 

67 

SR 

106 

53 

MSR 

260 

133 

For  illustrative  purposes,  we  consider  thresholds  for  MdSR  and  SR  between  40 
and  160  NM2/Hr.  For  MSR,  we  double  these  values,  and  consider  thresholds  between  80 
and  320  NM2/Hr.  Table  III-4  lists  these  thresholds  and  notes  whether  the  system  under 
test,  A  or  B,  is  expected  to  “pass”  or  “fail”  the  given  threshold.  In  a  sense,  the  “correct” 
answers  are  given  in  Table  III-4.  The  rest  of  this  section  examines  how  well  a  given  test 
situation  (test  conditions  and  control  rules)  allows  one  to  discern  these  correct  answers. 
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Table  111-4.  Pass/Fail  Expectations  for  Three  Search-Related  MOEs 


Thresholds  (NM2/Hr) 

System  A 

System  B 

MdSR,  SR 

MSR 

MdSR 

SR 

MSR 

MdSR 

SR 

MSR 

40 

80 

Pass 

Pass 

Pass 

Pass 

Pass 

Pass 

50 

100 

Pass 

Pass 

Pass 

Pass 

Pass 

Pass 

60 

120 

Pass 

Pass 

Pass 

Pass 

Fail 

Pass 

70 

140 

Pass 

Pass 

Pass 

Fail 

Fail 

Fail 

80 

160 

Pass 

Pass 

Pass 

Fail 

Fail 

Fail 

90 

180 

Pass 

Pass 

Pass 

Fail 

Fail 

Fail 

100 

200 

Pass 

Pass 

Pass 

Fail 

Fail 

Fail 

110 

220 

Pass 

Fail 

Pass 

Fail 

Fail 

Fail 

120 

240 

Pass 

Fail 

Pass 

Fail 

Fail 

Fail 

130 

260 

Pass 

Fail 

Pass 

Fail 

Fail 

Fail 

140 

280 

Fail 

Fail 

Fail 

Fail 

Fail 

Fail 

150 

300 

Fail 

Fail 

Fail 

Fail 

Fail 

Fail 

160 

320 

Fail 

Fail 

Fail 

Fail 

Fail 

Fail 

Using  the  parametric  bootstrap  procedures  that  were  outlined  in  Chapter  II,  we 
can,  for  an  individual  trial,  compute  the  percent  confidence  that  the  system  under  test  has 
an  actual  search -related  MOE  value  greater  than  or  equal  to  the  given  threshold  value. 
Recall  that  this  procedure  involves  first  estimating  the  shape  and  scale  parameters  of  the 
assumed  underlying  gamma  distribution  from  the  observed  times  to  detect/classify  and 
truncations  using  the  MLE  technique  described  in  Chapter  II.  Next,  resampling  of  this 
fitted  gamma  distribution  is  done  (2,000  x  number  of  total  events)  to  generate  bootstrap 
samples  of  the  appropriate  search-related  MOE.  From  these  2,000  bootstrap  samples, 
confidence  intervals,  for  example,  can  be  estimated. 

Table  III-5,  in  a  format  similar  to  Table  III-4,  presents  the  results  of  these 
parametric  bootstrap  calculations  for  one  particular  trial  of  one  test  situation.  The  bold 
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Table  111-5.  Percent  Confidence  That  One  Can  Ascribe  to  the  Claim  That  the  System 
Attains  the  Given  Threshold:  Trial  1  of  Run  {A|8|4|20|25}  and  Trial  1  of  Run  {B|8|4|20|25> 


Thresholds  (NM*/Hr) 

System  A 

System  B 

MdSR,  SR 

MSR 

MdSR 

SR 

MSR 

MdSR 

SR 

MSR 

40 

80 

100 

100 

100 

99 

95 

98 

50 

100 

100 

100 

100 

92 

73 

93 

60 

120 

100 

100 

100 

77 

42 

84 

70 

140 

100 

99 

98 

58 

20 

73 

80 

160 

98 

93 

96 

43 

9 

62 

90 

180 

94 

82 

91 

31 

3 

53 

100 

200 

87 

65 

85 

21 

1 

45 

110 

220 

79 

48 

79 

13 

1 

38 

120 

240 

70 

34 

72 

9 

0 

33 

130 

260 

61 

23 

65 

6 

0 

28 

140 

280 

50 

15 

57 

4 

0 

25 

150 

300 

40 

10 

51 

3 

0 

22 

160 

320 

34 

6 

46 

2 

0 

20 

underlines  drawn  into  Table  II1-5  indicate  the  location  of  our  pass/fail  expectation  of 
system  performance  (as  in  Table  III-4).  Figure  111-19  presents  the  data  of  Table  III-5 
graphically.  Figure  111-19  allows  one  to  compare,  for  this  one  trial,  the  “confidence 
curves”  for  the  three  MOEs  and  two  systems.  The  ideal  test  would  result  in  a  0-100  step- 
function  occurring  at  the  MOE’s  nominal  value  (e.g.,  106  NM'/Hr  for  System  A’s  SR). 
The  sharper  the  slope  of  the  s-shaped  curves  shown  in  Figure  111-19,  the  better  the  chance 
of  discerning  specific  system  performance  versus  a  given  threshold.  Given  this 
relationship,  it  is  clear  from  Figure  111-19  that  MSR  represents  the  MOE  that  would  be 
most  difficult  to  discern  from  a  given  threshold  (for  this  trial).  Alternatively,  SR  appears 
to  be  the  MOE  that  most  likely  would  lead  to  a  statistically  based  conclusion  when  its 
point  estimate  is  compared  to  a  given  threshold. 
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Figure  111-19.  Percent  Confidence  That  One  Can  Ascribe  to  the  Claim  That  the  System 
Attains  the  Given  Threshold:  Trial  1  of  Runs  {A|8|4|20|25}  and  {B|8|4|20|25} 
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Given  the  way  in  which  we  currently  are  doing  these  calculations,  on  a  personal 
computer  spreadsheet,  it  would  be  impractical  for  the  parametric  bootstrap  examinations 
described  above  to  be  done  for  all  9,600  trials  that  were  simulated.  (The  implication  is 
that  9,600  x  2,000,  or  about  19.2  million  bootstrap  samples  would  be  required,  with  each 
bootstrap  sample  requiring  between  4  and  21  random  draws  from  the  differing  fitted 
gamma  distributions,  plus  all  the  appropriate  calculations  of  MOEs.)  Rather  than  a 
complete  parametric  bootstrap  examination,  we  focus  on  a  few  illustrative  trials.  Our 
goal  is  to  describe  the  potential  impact  of  differing  test  control  rules  and  to  describe  the 
potential  value  and  limitations  of  the  various  search-related  MOEs. 

Table  III-6  shows  the  total  number  of  events  and  events  taken  to  completion  for 
the  first  10  trials  of  a  comparable  System  A  and  System  B  run  -  with  both  runs  involving 
an  8-day  test,  an  assumed  average  time  for  localization  and  attack  of  4  hours,  a  stopping 
rule  of  20  hours,  and  a  shrinkage  percentage  of  25.  The  MLE-fitted  gamma  parameters, 
a'  and  (T,  are  also  shown  in  Table  III-6.  The  shape  and  scale  parameters  for  the  fitted 
gamma  distribution  can  differ  quite  substantially  for  these  relatively  small  sample  size 
MLE-fits. 


Table  111-6.  First  10  Trials  of  Runs  {A|8|4|20|25}  and  {B|8|4|20|25}  * 


System  A 

System  B 

Trial  # 

Tev 

EvTC 

a' 

P' 

Tev 

EvTC 

a’ 

P' 

1 

15 

14 

1.38 

5.36 

12 

9 

1.22 

11.55 

2 

15 

15 

1.65 

3.64 

11 

9 

2.54 

4.92 

3 

16 

16 

1.51 

3.90 

12 

10 

1.15 

10.19 

4 

16 

16 

1.22 

4.62 

12 

11 

1.96 

5.22 

5 

16 

16 

2.13 

2.78 

13 

11 

1.00 

13.20 

6 

15 

15 

2.83 

2.16 

12 

10 

2.83 

3.99 

7 

15 

15 

2.00 

3.02 

11 

10 

5.84 

1.97 

8 

13 

12 

1.14 

7.32 

12 

10 

2.30 

5.30 

9 

14 

12 

1.00 

9.83 

12 

10 

2.00 

6.29 

10 

15 

14 

1.00 

6.69 

12 

9 

1.08 

15.14 

a  Tev  =  total  number  of  events,  EvTC  =  events  taken  to  completion,  a'  =  estimated  gamma 
shape  parameter,  (3’  =  estimated  gamma  scale  parameter.  SR  is  given  in  NM2/Hr. 
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Figure  III-20  presents  the  confidence  curves  for  these  10  trials  as  was  done  for 
trial  1  in  Figure  III- 19.  Figure  III-20  shows  how  these  confidence  curves,  which  one 
could  build  after  a  particular  test,  can  vary  -  at  least  for  our  gamma  distributed  times  to 
detect/classify  and  given  the  application  of  our  test  control  rules.  Again,  it  is  clear  that 
hypothesis  testing  of  MSR,  versus  some  threshold,  would  be  least  likely  to  lead  to  a 
definitive,  statistically  based  conclusion. 
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Figure  111-20.  Percent  Confidence  That  One  Can  Ascribe  to  the  Claim  That  the  System 
Attains  the  Given  Threshold:  Trials  1  Through  10  of  Runs  {AI8I4I20I25}  and  {BI8I4I20I25} 
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Next,  we  consider  the  length  of  the  80  percent  interval  associated  with  the  2,000 
bootstrap  MOE  samples.  We  normalize  this  interval  to  the  expected  MOE  value  (Table 
III-3)  and  create  the  unitless  IL/PE  value,  this  time  from  the  bootstrap  sample.  We  also 
consider  the  percentage  of  bootstrap  samples  (out  of  a  total  of  2,000)  that  are  greater  than 
or  equal  to  the  expected  MOE  value.  We  refer  to  this  as  the  approximate  percent 
confidence  (PercConf)  that  one  can  ascribe  to  the  claim  that  the  system  attains  the  given 
threshold.  Ideally,  one  would  like  the  chosen  MOE  to  have  an  IL/PE  value  near  zero  for 
all  trials  and  a  PercConf  value  clustered  around  one  point  for  all  trials. 

For  each  of  the  20  trials  described  in  Table  III-6,  we  present  a  scatterplot  of 
PercConf  versus  IL/PE  (Figure  III-21).  Each  of  the  points  in  Figure  III-21  represents  a 
different  trial  (1  through  10)  or  a  different  MOE  (MdSR,  SR,  or  MSR).  This  scatterplot 
is  meant  to  show  the  interaction  between  IL/PE  and  PercConf.  The  between-trials 
variance  associated  with  the  sorts  of  conclusions  that  one  might  come  to  if  MSR  was 
used  as  the  search-related  MOE  is  apparent  from  Figure  III-21.  In  particular,  for  the 
trials  in  which  the  IL/PE  value  is  relatively  low  for  MSR  (below  1.0),  the  PercConf  value 
is  also  low  (below  60  percent).  Alternatively,  for  those  trials  that  led  to  higher  PercConf 
values  for  MSR,  the  IL/PE  values  are  large. 
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Figure  111-21 .  Scatterplot  of  Percent  Confidence  That  Actual  MOE  Value  Is  Attained  Versus 
IL/PE  for  the  First  10  Trials  of  Runs  {AI8I4I20I25}  and  {BI8I4I20I25} 
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SR  and  MdSR  appear  to  have  similar  behavior  for  these  10  trials,  with  MdSR 
being  slightly  more  dispersed  in  IL/PE  and  SR  being  slightly  more  dispersed  in  PercConf. 
Given  the  above-described  ideal  requirements,  SR  and  MdSR  appear  to  be  of  similar 
merit  with  respect  to  their  usage  as  measures  to  be  compared  to  predefined  thresholds. 

Table  III-7  considers  the  first  trial  of  32  different  runs.  This  table  reports  the 
MLE-fitted  gamma  parameters  and  estimated  SR  for  the  first  trial  of  each  of  these  32 
different  test  situations. 


Table  111-7.  First  Trial  of  32  Different  Runs* 


System  A 

System  B 

Case 

# 

Run 

Description 

Tev 

EvTC 

t 

a 

P' 

SR 

Tev 

EvTC 

a’ 

P’ 

SR 

1 

{8|4|24|50> 

14 

14 

1.40 

5.19 

110 

12 

10 

1.26 

10.91 

58 

2 

{8)4124125} 

14 

14 

1.40 

5.19 

110 

12 

10 

1.26 

10.91 

58 

3 

{8|4|20|50} 

15 

14 

1.38 

5.36 

108 

13 

11 

1.39 

9.66 

60 

4 

{8|4|20|25} 

15 

14 

1.38 

5.36 

108 

12 

9 

1.22 

11.55 

60 

5 

{8|4|16|50} 

15 

14 

1.48 

4.81 

112 

17 

14 

1.00 

14.51 

55 

6 

{8|4|16|25} 

15 

14 

1.48 

4.81 

112 

13 

10 

1.37 

9.98 

59 

7 

{8|4112|50} 

18 

16 

1.66 

4.14 

116 

19 

16 

1.77 

7.49 

60 

8 

{814112125} 

17 

15 

1.58 

4.49 

113 

16 

12 

1.55 

9.27 

56 

9 

{4)4124|50} 

7 

7 

1.22 

4.06 

162 

4 

4 

6.97 

2.08 

55 

10 

{414)24125} 

7 

7 

1.22 

4.06 

162 

4 

4 

6.97 

2.08 

55 

11 

{414|20I50} 

7 

7 

1.22 

4.06 

162 

5 

4 

5.41 

2.60 

57 

12 

{4|4|20|25} 

7 

7 

1.22 

4.06 

162 

5 

4 

5.41 

2.60 

57 

13 

{414)  16)50} 

8 

7 

1.00 

7.23 

111 

7 

5 

1.00 

12.68 

63 

14 

{414)16125} 

8 

7 

1.00 

7.23 

111 

7 

5 

1.00 

12.68 

63 

15 

{4|4|1 2)50} 

8 

7 

1.06 

6.23 

121 

8 

5 

1.04 

12.82 

60 

16 

{414)12125} 

8 

7 

1.06 

6.23 

121 

8 

5 

1.04 

12.82 

_ _ _ ^ 

60 

a  Tev  =  total  number  of  events,  EvTC  =  events  taken  to  completion,  a’  =  estimated  gamma 
shape  parameter,  p'  =  estimated  gamma  scale  parameter. 
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It  was  often  true  that  small  changes  in  test  control  rules,  for  instance  cases  1  and 
2,  led  to  no  changes  in  the  simulated  outcome  of  the  OT.11  Therefore,  the  number  of 
trials,  number  of  truncations,  and  observed  times  to  detect/classify  for  these  cases  with 
similar  test  control  rules  would  be  identical.  This  phenomena  can  be  recognized  in  Table 
III-7  as  the  identical  pairs  of  estimated  gamma  parameters  (e.g.,  cases  1  and  2  or  15  and 
16). 

For  the  8-day  tests  of  System  A,  shown  in  Table  III-7,  estimates  of  SR  vary  from 
108  to  116  NM2/Hr.  Four-day  estimates  of  SR  vary  much  more  -  from  1 11  to  162 
NM2/Hr.  For  System  B,  these  particular  trials  showed  little  variation  in  estimates  of  SR. 
See  Table  II1-7. 

From  the  runs  of  Table  III-7,  plots  of  PercConf  and  IL/PE  versus  run  conditions 
are  shown  for  Systems  A  and  B  in  Figure  111-22.  Given  the  previously  described 
deficiency  with  MSR  as  an  MOE,  namely,  relatively  large  variance  associated  with 
estimates  of  MSR,  it  has  not  been  included  in  Figure  111-22.  In  addition,  only  the 
non-redundant  cases  are  plotted  in  Figure  111-22.  For  example.  System  A  case  1  is 
presented,  but  not  System  A  case  2. 

With  respect  to  System  A,  the  impact  of  test  duration  is  most  apparent.  In 
particular,  the  IL/PE  value,  our  measure  of  variance,  increases  dramatically  for  the 
shorter  test  (case  numbers  9,  11,  13,  and  15  for  System  A).  For  this  trial,  it  is  also  seen 
that  dropping  the  stopping  rule  from  24  to  12  hours  and  using  a  50  percent  vice  25 
percent  shrinkage  rule  (case  7),  led  to  the  smallest  values  of  IL/PE  and  did  not 
appreciably  affect  the  PercConf  estimates.  We  note  that,  for  System  B,  the  PercConf 
value  associated  with  the  case  7  SR  rises  above  all  others  (for  System  B).  This  may  be  a 
manifestation  of  the  upward  biasing  of  SR  when  the  12-hour  stopping  rule  and  50  percent 
shrinkage  rule  are  employed  together,  as  discussed  earlier.12  The  final  feature  associated 
with  this  Figure  is  the  unexpectedly  low  PercConf  associated  with  MdSR  for  System  B 
cases  9  and  1 1.  As  can  be  seen  in  Table  II1-7,  the  MLE-fits  for  these  very  small  sample 
size  cases  (four  events)  can  vary  substantially  and  hence  lead  to  very  different 
conclusions,  given  that  the  aforementioned  parametric  bootstrap  methodology  is 
employed  for  threshold  comparisons.  For  these  data,  the  SR  MOE  appears  to  be 
somewhat  more  robust  to  this  variance  mechanism.  That  is,  although  the  fitted  shape  and 
scale  parameters  are  quite  different  for  the  eight,  4-day,  System  B  tests,  the  computed 


11  Recall  that  the  same  set  of  random  numbers  was  used  for  each  test  situation. 

12  The  same  biasing  effect  on  SR  can  be  seen  in  case  7  for  System  A  (Figure  111-23). 
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PercConf  values  vary  by  less  than  20  percent  (from  60  to  72)  for  SR.  The  comparable 
MdSR  values  vary  by  217  percent  (from  23  to  73). 


1  3  5  7  8  9  11  13  15  1  3  4  5  6  7  8  9  11  1 3  15 


Case  Number  from  Table  111-7  Case  Number  from  Table  111-7 

Figure  111-22.  Impact  of  Test  Control  Rules  on  Percent  Confidence 
That  MOE  Value  Is  Attained  and  IL/PE  for  MdSR  and  SR 


E.  CONCLUSIONS 

With  respect  to  the  employment  of  test  control  rules,  the  analyses  of  this  chapter 
support  the  following  conclusions: 

1.  Employing  stopping  rules  for  free-play  ASW  search  OT  can  increase  the 
number  of  encounters  generated  during  the  test  and  maintain  elements  of  test 
realism.  The  use  of  such  rules  will  be  particularly  valuable  when  the  system 
under  test  performs  significantly  worse  than  pre-test  expectations. 

2.  Longer  test  periods  (on  the  order  of  8  days  or  more  rather  than  4  days)  are 
more  likely  to  be  positively  affected  by  the  test  control  rules  described  in  this 
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document.  That  is,  free-play  test  durations  of  4  days  or  fewer  will  be  only 
minimally  affected  by  the  rules  described  in  this  document. 

3.  The  use  of  the  12-hour  stopping  rule  with  a  50  percent  shrinkage  rule  led  to 
unrealistically  short  times  to  detect/classify  for  some  trials.  For  the  system 
performances  examined  (i.e.,  System  A  and  B),  stopping  rules  of  16,  20,  and 
24  hours,  used  in  concert  with  50  or  25  percent  shrinkage  rules,  appeared 
satisfactory  from  this  perspective. 

4.  With  respect  to  the  search-related  MOEs  that  were  investigated: 

•  In  the  case  of  SR  and  MdSR,  stopping  rules  of  16  and  20  hours 
appeared  to  represent  a  reasonable  variance-reducing/ 
realism-maintaining  compromise. 

•  The  MSR,  because  of  the  large  variance  associated  with  its  estimation, 
does  not  appear  to  be  a  good  choice  for  a  search-related  MOE. 

•  Given  the  employment  of  the  test  control  rules  described  in  this 
document,  both  MdSR  and  SR  appear  to  represent  satisfactory  search- 
related  MOEs.  Whereas  MdSR  can  be  directly  estimated  from  the 
observed  events,  an  MLE  procedure  should  be  used  to  include  censored 
data  in  estimates  of  SR. 

•  Given  a  “set  of  observations”  (trial),  a  parametric  bootstrap  technique 
can  be  used  to  estimate  the  given  search-related  MOE  and  to  attach 
confidence  intervals.  In  addition,  in  the  case  of  MdSR  and  SR,  this 
technique  can  be  used  to  arrive  at  statistically  based  conclusions  (e.g., 
hypothesis  testing)  relative  to  predefined  thresholds. 
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ASW 

Anti-Submarine  Warfare 

Avg 

Average 

BARSTUR 

Barking  Sands  Tactical  Underwater  Range 

BSURE 

Barking  Sands  Underwater  Range  Expansion 

C.I. 

Confidence  Interval 

COI 

Critical  Operational  Issue 

COMEX 

Commencement  of  the  Exercise 

COM  OPTE  VFOR 

Commander,  Operational  Test  and  Evaluation  Force 

DOT&E 

Director  Operational  Test  and  Evaluation 

EvTC 

Events  Taken  to  Completion 

FASz 

Final  Area  Size 

F1NEX 

Finish  Exercise 

FOM 

Figure-of-Merit 

Hr 

Hour 

IDA 

Institute  for  Defense  Analyses 

IL/PE 

Interval  Length  /  Point  Estimate 

L&A 

Localization  and  Attack 

Loc 

Localization 

MDR 

Median  Detection  Range 

MdSR 

Median  Search  Rate 

MLE 

Maximum  Likelihood  Estimation 

MOE 

Measure  of  Effectiveness 

MOP 

Measure  of  Performance 

MSR 

Mean  Search  Rate 

NM2 

Square  Nautical  Miles 

NT 

Normalized  Time 

NT(C) 

Normalized  Time  With  Censored  Data 

OPEVAL 

Operational  Evaluation 

OPTE  VFOR 

Operational  Test  and  Evaluation  Force 

OT 

Operational  test 

OTD 

Operational  Test  Director 
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OT&E 

Operational  Test  and  Evaluation 

PercConf 

Percent  Confidence 

Repo 

RndT 

RndT(C) 

Reposition 

Random  Time 

Random  Time  With  Censored  Data 

SR 

Search  Rate 

Tev 

Total  Events 
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This  appendix  presents  the  results,  in  a  table,  of  the  96  test  situations  that  were 
simulated  in  this  study.  Each  situation  or  run  is  characterized  by  a  different  set  of  initial 
test  conduct/test  control  conditions.  Definitions  for  each  of  these  test  conduct/test  control 
initial  conditions  are  given  in  Table  B-l.  Similarly,  Table  B-2  identifies  and  defines  the 
various  output  measures  (results)  that  are  presented.  Finally,  Table  B-3  presents  the 
information  described  in  Tables  B-l  and  B-2  for  each  of  the  96  runs.  For  each  run  (i.e., 
test  situation)  and  each  output  measure,  listings  describing  the  frequency  of  occurrence 
for  various  values  are  provided.  These  listings  are  suitable  for  defining  histograms  for 
each  measure  and  as  such,  provide  an  estimate  of  how  each  measure  is  distributed. 
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Table  B-1 .  Test  Situation  Input  Parameters 


Parameter 

Description 

SYSTEM 

“A”  corresponds  to  a  nominal  system  with  an  average  time  to  detect/classify  of  7.54  hours 
and  “B”  corresponds  to  a  system  with  an  average  time  to  detect/classify  of  15.08  hours 

Start  Area 

This  value  represents  the  area  size  at  the  start  of  the  test.  This  value  was  chosen  as  800 

NM2  for  all  runs. 

Days 

This  number  corresponds  to  the  length  of  the  simulated  OT  (or  trial)  (4  or  8  days). 

Loc  Time 

This  number  corresponds  to  the  time  added  to  each  detection/classification  for 
localization  and  attack  (2,  4,  or  8). 

Repo  Time 

This  number  (in  hours)  corresponds  to  the  time  added  to  each  detection/classification  or 
truncation  for  repositioning  after  the  encounter  or  truncation. 

Stop  Time 

This  is  the  time  (in  hours)  at  which  the  OTD,  given  no  detection/classification,  stops  the 

event  (12, 16,20,  and  24). 

Shrink  % 

This  is  the  percentage  (25  or  50)  that  the  OTD  shrinks  the  area  size  to  be  searched  given 
he  has  observed  two  or  more  truncations  during  that  simulated  OT  (trial). 

6-Pt  Time 

If  the  running  six-point  average  during  a  given  simulated  OT  (trial)  goes  below  this  value, 
the  OTD  expands  the  area  size  for  the  next  search.  This  value  was  always  chosen  as  1.5 

hours  for  the  96  runs  reported  here. 

5-Pt  Time 

If  the  running  five-point  average  during  a  given  simulated  OT  (trial)  goes  below  this  value, 
the  OTD  expands  the  area  size  for  the  next  search.  This  value  was  always  chosen  as  1 .0 

hours  for  the  96  runs  reported  here. 

Expand  % 

This  is  the  percentage  that  the  OTD  expands  the  area  size  to  be  searched  given  the  six- 
point  or  five-point  expansion  rules  have  been  triggered.  This  number  was  chosen  as  200 
for  all  of  the  reported  runs.  (See  footnote  19  of  Chapter  II,  page  11-17.) 
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Table  B-2.  Description  of  Output  Measures 


Measure 

Description 

EvTC 

The  number  of  events  per  trial  that  were  completed  with  a  simulated  encounter. 

Tev 

The  total  number  of  events  per  trial.  That  is,  events  that  ended  with  a  simulated 
encounter  or  OTD-forced  truncation  are  included. 

FASz 

The  number  of  trials  in  which  the  area  size  being  searched  (in  NM2)  on  the  final  event 

was  of  a  specified  value. 

RndT 

The  number  of  trials  in  which  the  average  time  to  detection/classification  (of  the  random 
draw  -  in  hours)  was  of  a  specified  value. 

RndT(C) 

The  number  of  trials  in  which  the  average  time  to  detection/classification  (of  the  random 
draw  with  the  censoring  (truncations)  due  to  the  test  control  rules  -  in  hours)  was  of  a 
specified  value.  That  is,  this  value  corresponds  to  a  censored  average. 

MdSR 

The  frequency  of  the  computed  median  search  rates  (in  NM2/Hr). 

A/Avg(NT) 

The  frequency  of  the  computed  search  rates  (SR)  (in  NM2/Hr  and  normalized  to  800 
NM2)  based  solely  on  the  random  draws.  (See  page  11-16,  Equation  ll-5a  for  definition  of 

SR.) 

A/Avg(NT(C)) 

The  frequency  of  the  computed  search  rates  (SR)  (in  NM2/Hr)  based  on  the  random 
draws  with  censoring  due  to  the  test  control  rules. 

MSR 

The  frequency  of  the  computed  mean  search  rates  (in  NM2/Hr)  based  solely  on  the 

random  draws. 

MSR(C) 

The  frequency  of  the  computed  mean  search  rates  (in  NM2/Hr)  based  on  the  random 
draws  with  censoring  due  to  the  test  control  rules. 

B-3 


-*-^00^003  0  0  —  0* 


li 


:S888SS8888S888gg|88Sg88g8S88gigg8| 


■*.  0oooor^rtr-«^«)»^2,rt^’■r>r,M0^*,‘,"'"rto’"OO^OO’'O;, 

|  S  °8S8SS8S8888g88§ggg8g3g|8g8gg8gg8g8| 


OOOOOOO^C 


!  «  tt  N  a  «  «  oos  o*-oooooo3  oooo: 


I" 


.  O  O  O  O  O  O 


S88S88|Sgafa88|88S§8S88SS8Sg8Sja| 


*  ©000000*-«00>^5*°^i- 


—  003  0  —  0000003  00003 


Is  °288S88S28§288gS8S|882SS3S8S882g8gs| 

3 


4* 

1  . 

o> 

g 


>oe>ooo-~r>kr>->»«c«r 


[  °  o  Cl  «3  «  7>  (NJOO^OOOO>*  00003 


g |S  °288SSS8S88S883SSS8882S83S8e882g8g8| 


soo*-i«‘3}Q®;;f"^r-*'*oooo3  003 


0>  5 

Jr  t 

«  r 

I  * 

09 


o  «-  rM  r> 


-C 

C5 

H 


o  o  o  o  *-  ifl  J 


2<cp,rs.m«i0t00003  003 


*  £ 


O  —  d<*)'+rtO'^* 


0000000000000000003 


>as8g|8as888a§g8§|| 


ISJSI-'OOO003  003  000000003 

o>  O  r- 


*  oooooo" 

%  0000-**00)^^J5J*-»-000003  003  000000003 

p  r-  O<-Nn«AUM^O>-MntiAVN0OIX 

►.  £  OT-Nn«y)ttN«0>"^""rrl!:r"rMMNMNNNNNN^ 


<1---  si  5-s 


*  O  O  O  O  ( 


—  0(03  *-*-«-<'>0«“0003  oo-o-o 


I*  oS5sg?5?®?§S21?§§Si^§532ii§Sl§l§2§§li 


%  ooooonrt^®!! 


(M03  *-*-■• 


>003  OO^OW 


8s  o8S88883gg88S88§gg3883gS888gg8g8g888| 


OOOOOOO'***^ 


O  <M  «  r 


I*  «  ^  o*  0  0  3 


jv-00000003  00003 


s  °£885a8S8g8288SSS8i882S83aS88882gSgS| 


^  00©000©fs*CC,^2®S2^®^'*  003  0  —  00000003  OOOOD 

Is  oS8S8888888SgS§a|S8 882 8888888882 ggga| 

3 


O  O  O  O  O  O  v- 


&  O  o  o  o  o 


«■>(<•>»>  <moo»*ooooo*^  ooooo 

88888828838858 882 8888868882 ggg8| 


OOOO^-O^JSr^^^^00003  003 


lS 


>  -  N  «  t  ► 


)  ^  n  n  t  iO  i 


-f 


(000  03  o  < 


%  OOOO-dgSlS?- 

o-n«-.3)«k.o»s;2c;;:5:5»s| 

«  0000000000000080003 

^S  °2S8R88gJ888ag88|||j 

w  ooooonUJJSJJ'-'-ooooos  oosooooooooo: 


)  C  N  £  ffl  o  r  - 


5  (D  n  C  ^  O 


*  OOOO  —  —  —  000003  °  °  °  0000000003 


1 - s  » 

•  W  M 


1Si-a||fils55 


B-4 


h-rt(*)tvtNC*0»-<N00«'>0a-»-0030©^» 


*  ©oo^ow®®®^ 


i  01  ™  o>  k  n  n  f 


i  no'’  <y©©<*j©***-oo3  o  o  *-  *-  > 


SS  ogs8S|§?§?I§3?Sl§Slll3IIIISS§?lllil§I 


%  0©0000*-rt«D®® 


o  *-  ■-  ©ooooooooo  00003 


oooooggggg82SS5S|S|8S?§g3aSg28S|§g§S| 


lOONOKlt  ©  »-  -  0000000003  00003 


—  ooooooooo 

a! 

e 

o 

f/1  »  ooooo«-*-^oiw 


8f8a|SSSS88|gggaSgS88|gggsI 


(ir~®  04  ■»  *-  N*-rNOO^OOf  00003 


ss 

*{  Is  °S88S88£88ge88SS8eS8SSga888S888g§8gS| 

05  « 

C 

o 

•mm  m  OOOO^‘^tS«Cl<0<ii,r0100^'0000^ 


E  g* 

35  r 


o  o  j 


•? 

00 
v 
2 

«  I® 

H 


*  0000’-N^®N','Ne’'OOOM:i  003 

si 


o  •*  n  n  ^ 


ooooooooooo-oogooc 


5s  ossissisjgsiiiiiiii 


ij 


*  O0000rs»®gf35^-»00000030000000000003 


i  <c  s  «  «"  ; 


•®p'n2S^|,,000000::>  003  O  C 


5  0  0  0  0  0  0: 


)  t  O  #  N  ®  »  , 


>  <  g  «  n  n  ®  g  "'-8 

®  «N 


si  sf  •  i*i  !  I* 

:g»  i 

lS|5  8  I 

•* «  -<  *  i>  «  w  8- •  **  w 


o  O  O  *-  O  lO  u 


N  O  f  <MO0OO»-»-0O3  O  O  —  ' 


S  =8S88|Sll?gSS888||8S|S|S888|S88||188j 


«  OOOv'Ot/liAOlA^OO 


!  Cl  N  n  rt  N  N  (NO*-  M00«0*-»-0< 


l  O  O  v-  •“  ^ 


{i  =8888?S?8888SS88g||S8338888S888||88S| 

«  000000-0<0«00>®^^^rf)l«^MO--000000000300003 


|4  °S 


iS88S88fS§8SS8£i88S88SS8S888g§8gs| 


w  o  o  o  o  o 


.  0000000003  00003 


Is  oss8S88£sgss55?2?S!lg§§SgggS2Sl5§§l?l 

3 


«  O  O  O  O  O  ' 


1  •»  IN  J!  ^  ^  « 


4  0  0^00^  00003 


5.6  0oooogoooo8e88|8|e|885§ggS8g8882|8ga 


*  o  ©  o  ©  »-  r-  £:  $ 


1  ■*  <M  O  O  *-  ©  3  003 


•go  *”  2 


oooo-^^RS^'-* 


50  N3  o  o  ; 


©t-MCJ*’®©*-®® 


K  ooooooooooooo  —  goooo 

»S  °8$§~§SK3§*i31§§iii 


«  00000<MOgf3J5^«*0000003003  0000000003 

«  0000^0t-j;®gc;rt0000003  003  0000000003 


o  *-  o  ■*»  t 


»S;22;S«tiS28N«SSSSS*SS| 


,r  <  g  *  n  *  ®g  "  ^  8 

»3j.l|||dll? 

Ch'  £ 

8  8-o"-S^  S’ 

*>  W  j  ^  J  W  g-W^LU 


B-5 


Table  B-3.  Simulation  Results  (cont) 

RndT  RndT(C)  MdSF  A/A*v(NT)  A/Avg(NT(C)) 
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Table  B-3.  Simulation  Results  (cont) 

HndT  HndT(C)  A/Avg(NT)  A/Avg(NT(C»  HS* 


ooooo«^«r*.»»”<i 


>  "  O  v-  O  O  O  3  ©WOO? 


>0000000"  o  o  o  a  ? 


5  °8S8S|§3ii88SSSSa||8||3|||§||S||i||8|  f*  = 8 S 8 S I g ? 1 1 §  § S 8 S 8 8 || 8 8 3 | 8 8 8  § | 8 S | g 1 8 | 8 j 


ooooort'rn^'J!?^'0-'' 


-  "  <**"0"0"0©03  "  O  O  "  ?>  I 


*  O©©O©W(O<,>5£0><O^»Jl2®<'*<'*O:’,< 


40000000"  0  0  0  0*1 


l  «8S88§5S88|§g88|gilg8|||8|§|§||l|iS8|  |fi  = 8 8 8 8 | g ! g S 8 8 8 8 8 8 § g 1 8 | g 3 8 | 8 g 3 8 8 | 3 | f | 8 | 


""O  oooo 


000003  oooool 


OOOOOOO"OrtU>^«®0>2!oNN"O0OOOOOOOO"0OOOO0l 


E  0ooqooo©o©8S88S882S882883S828882§8§8§| 

5  o^Nrt^iftjMiSS^SSJIESJJ^NMNMWNNNMNnnnnnnj 


£  £  o  ©  ‘ 

1“ 


:  SSSSi 


*  oooooonNee'”"®; 


—  C  H>  MN  "OO  0000000003  OOOOOl 


*  00000""<^0>*^:vrv-i_ 


OOCC""  003  oooooooo— o  < 


5  oe8SS8SS888?S????S2882§g38Sg888|g8|g| 


>?S8S8828§828SS88S288g§8S88eS88|g§gS| 


>g2oog|288gSSS288gggSS8g8882§8gS| 


>8g88f88S2 882 §838828882 gggS|| 


;S2sn<*0,'00< 


>  *  £  ®  ®  g  §1 


>00"0<N^J5JS"'^®^,’_00'"’-=5  0031 


oooooorxoooo^oogooool 


*  oooooooooooor-SsJgooool 


il  -g-8ass«xsgsgss§§!i| 


^  ossiS2§82i2i5§iia§i 


*  ooooooooJ5J35*e>«»>ooo 


ooo  OOO  ooooooooool 


3  0  0  0  ooo  ooooooooool 


*  ooooooofjggjooiooooooo  ooo  ooooooooool 


J«^»o^0^00°oo  ooo  ooooooooool 


<S  3-S 


t8,~~  ~£  3- 


=a3  -  Jlljs 

lejo s jj|ll?5 i 


*3  j  •  ililSIiii 

•S  jS|8»>6-33iS 


B-7 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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Table  B-3.  Simulation  Results  (cont) 
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